Skip to content

Commit 7e0e5ca

Browse files
author
Connell, Joseph
committed
Added Otel Readme
1 parent 5aebba8 commit 7e0e5ca

File tree

1 file changed

+369
-0
lines changed

1 file changed

+369
-0
lines changed

deploy/docker/opentelemetry/README.md

Lines changed: 369 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,369 @@
1+
# OpenTelemetry Integration for Lowcoder
2+
3+
This document provides comprehensive instructions for enabling, configuring, and verifying OpenTelemetry tracing and metrics for the Lowcoder application, which includes both Java backend services and Node.js components. OpenTelemetry enables unified observability, distributed tracing, and metrics collection across your stack, supporting integration with Tempo, Prometheus, Grafana, and other observability backends.
4+
5+
---
6+
7+
## Table of Contents
8+
9+
- [Overview](#overview)
10+
- [Architecture](#architecture)
11+
- [Prerequisites](#prerequisites)
12+
- [Quick Start](#quick-start)
13+
- [Configuration](#configuration)
14+
- [Common Environment Variables](#common-environment-variables)
15+
- [Java API Service](#java-api-service)
16+
- [Node.js Service](#nodejs-service)
17+
- [Monitoring and Visualization](#monitoring-and-visualization)
18+
- [Distributed Tracing (Tempo + Grafana)](#distributed-tracing-tempo--grafana)
19+
- [Grafana](#grafana-dashboards)
20+
- [Prometheus](#prometheus-metrics-collection)
21+
- [Advanced Configuration](#advanced-configuration)
22+
- [Troubleshooting](#troubleshooting)
23+
- [Production Considerations](#production-considerations)
24+
- [Support](#support)
25+
- [Contributing](#contributing)
26+
27+
---
28+
29+
## Overview
30+
31+
Lowcoder leverages OpenTelemetry auto-instrumentation for both Java and Node.js services, providing:
32+
33+
- **Distributed Tracing:** End-to-end visibility of requests across services.
34+
- **Metrics Collection:** Performance and health metrics for all components.
35+
- **Flexible Export:** Support for Tempo, Prometheus, Grafana, Datadog, New Relic, and more.
36+
37+
---
38+
39+
## Architecture
40+
41+
```
42+
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
43+
│ Lowcoder │ │ OpenTelemetry │ │ Observability │
44+
│ Application │───▶│ Collector │──▶│ Backends │
45+
│ (Java + Node.js)│ │ │ │ (Jaeger/Grafana)│
46+
└─────────────────┘ └──────────────────┘ └─────────────────┘
47+
```
48+
49+
---
50+
51+
## Prerequisites
52+
53+
- **Docker** and **Docker Compose**
54+
- At least **4GB RAM** for the full observability stack
55+
- An OpenTelemetry Collector instance (included in docker-compose)
56+
- Access to Jaeger, Prometheus, and Grafana (included in docker-compose)
57+
58+
---
59+
60+
## Quick Start
61+
62+
### 1. Clone and Build
63+
64+
```bash
65+
git clone https://github.com/lowcoder-org/lowcoder.git
66+
cd lowcoder/deploy/docker
67+
docker-compose -f ./docker-compose-multi-otel.yaml up -d
68+
```
69+
70+
### 2. Access Services
71+
72+
- **Lowcoder Application:** http://localhost:8080
73+
- **Jaeger UI (Traces):** http://localhost:16686
74+
- **Grafana (Dashboards):** http://localhost:3000 (admin/admin)
75+
- **Prometheus (Metrics):** http://localhost:9090
76+
77+
### 3. Generate Traffic
78+
79+
Use the Lowcoder app to generate telemetry data, then view traces and metrics in Jaeger, Grafana, and Prometheus.
80+
81+
---
82+
83+
## Configuration
84+
85+
### Common Environment Variables
86+
87+
| Variable | Description | Default/Example |
88+
|---------------------------------|------------------------------------|----------------------------------------|
89+
| `OTEL_SDK_DISABLED` | Disable all telemetry | `false` |
90+
| `OTEL_SERVICE_NAME` | Java service name | `lowcoder-java-backend` |
91+
| `OTEL_NODE_SERVICE_NAME` | Node.js service name | `lowcoder-node-service` |
92+
| `OTEL_EXPORTER_OTLP_ENDPOINT` | Collector endpoint | `http://otel-collector:4317` |
93+
| `OTEL_RESOURCE_ATTRIBUTES` | Additional resource attributes | `deployment.environment=production` |
94+
| `OTEL_TRACES_EXPORTER` | Trace exporter | `otlp`, `jaeger`, `none` |
95+
| `OTEL_METRICS_EXPORTER` | Metrics exporter | `otlp`, `prometheus`, `none` |
96+
| `OTEL_LOGS_EXPORTER` | Logs exporter | `otlp`, `none` |
97+
| `OTEL_TRACES_SAMPLER` | Sampling strategy | `traceidratio` |
98+
| `OTEL_TRACES_SAMPLER_ARG` | Sampling parameter | `0.1` (10%) |
99+
100+
#### Example (docker-compose.yml):
101+
102+
```yaml
103+
environment:
104+
- OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
105+
- OTEL_SERVICE_NAME=lowcoder-java-backend
106+
- OTEL_NODE_SERVICE_NAME=lowcoder-node-service
107+
- OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production
108+
- OTEL_TRACES_SAMPLER=traceidratio
109+
- OTEL_TRACES_SAMPLER_ARG=0.1
110+
```
111+
112+
---
113+
114+
### Java API Service
115+
116+
- Uses OpenTelemetry SDK with auto-instrumentation.
117+
- Service name and resource attributes are set via environment variables or configuration.
118+
- Traces and metrics are exported to the OpenTelemetry Collector.
119+
120+
**Custom Spans Example:**
121+
```java
122+
import io.opentelemetry.api.GlobalOpenTelemetry;
123+
import io.opentelemetry.api.trace.Tracer;
124+
125+
Tracer tracer = GlobalOpenTelemetry.getTracer("my-service");
126+
Span span = tracer.spanBuilder("custom-operation").startSpan();
127+
try {
128+
// Your business logic here
129+
} finally {
130+
span.end();
131+
}
132+
```
133+
134+
---
135+
136+
### Node.js Service
137+
138+
- Uses `@opentelemetry/sdk-node` and `@opentelemetry/auto-instrumentations-node`.
139+
- Configuration is handled in `otel.config.js` and via environment variables.
140+
- Traces and metrics are exported to the OpenTelemetry Collector.
141+
142+
**Custom Spans Example:**
143+
```javascript
144+
const { trace } = require('@opentelemetry/api');
145+
146+
const tracer = trace.getTracer('my-service');
147+
const span = tracer.startSpan('custom-operation');
148+
try {
149+
// Your business logic here
150+
} finally {
151+
span.end();
152+
}
153+
```
154+
155+
---
156+
157+
## Monitoring and Visualization
158+
159+
### Distributed Tracing (Tempo + Grafana)
160+
161+
- **Trace Storage:** [Grafana Tempo](https://grafana.com/oss/tempo/) is used as the distributed tracing backend.
162+
- **Visualization:** Traces are visualized and explored via [Grafana](https://grafana.com/).
163+
- **URL:** http://localhost:3001 (Grafana, default login: admin/admin)
164+
- **Features:**
165+
- View end-to-end traces and span timelines
166+
- Analyze service dependencies and request flows
167+
- Debug errors and identify bottlenecks
168+
- Correlate traces with metrics and logs
169+
170+
### Grafana (Dashboards)
171+
172+
- **URL:** http://localhost:3001 (admin/admin)
173+
- Visualize metrics, create dashboards, set up alerts, and monitor application health.
174+
- Add Prometheus as a data source (`http://prometheus:9090`).
175+
- Explore traces via the Tempo data source.
176+
177+
### Prometheus (Metrics Collection)
178+
179+
- **URL:** http://localhost:9090
180+
- Query raw metrics, set up recording and alerting rules, and monitor collector health.
181+
182+
---
183+
184+
## Advanced Configuration
185+
186+
### Customizing Service Names and Attributes
187+
188+
```yaml
189+
environment:
190+
- OTEL_SERVICE_NAME=lowcoder-api-service
191+
```
192+
193+
### Adjusting Sampling
194+
195+
```yaml
196+
environment:
197+
- OTEL_TRACES_SAMPLER=traceidratio
198+
- OTEL_TRACES_SAMPLER_ARG=0.05 # 5% sampling
199+
```
200+
201+
### Custom Collector Configuration
202+
203+
Edit `otel-collector-config.yaml` to:
204+
205+
- Add receivers (e.g., filelog)
206+
- Configure exporters (e.g., Jaeger, Datadog, New Relic)
207+
- Add processors and filtering rules
208+
209+
**Example for log collection:**
210+
```yaml
211+
receivers:
212+
filelog:
213+
include: [/var/log/lowcoder/*.log]
214+
operators:
215+
- type: json_parser
216+
217+
service:
218+
pipelines:
219+
logs:
220+
receivers: [filelog, otlp]
221+
processors: [batch]
222+
exporters: [logging]
223+
```
224+
225+
### External Backend Integration
226+
227+
**Datadog Example:**
228+
```yaml
229+
exporters:
230+
datadog:
231+
api:
232+
key: "${DD_API_KEY}"
233+
site: datadoghq.com
234+
```
235+
236+
**New Relic Example:**
237+
```yaml
238+
exporters:
239+
otlp:
240+
endpoint: https://otlp.nr-data.net:4317
241+
headers:
242+
api-key: "${NEW_RELIC_LICENSE_KEY}"
243+
```
244+
245+
---
246+
247+
## Troubleshooting
248+
249+
### Common Issues
250+
251+
#### No Traces Appearing
252+
253+
- Ensure the OpenTelemetry Collector is running (`docker-compose ps`)
254+
- Verify the collector endpoint (`curl http://localhost:4317`)
255+
- Check application logs for OTEL errors
256+
- Ensure `OTEL_SDK_DISABLED` is not set to `true`
257+
258+
#### High Memory Usage
259+
260+
- Reduce sampling rate (`OTEL_TRACES_SAMPLER_ARG=0.01`)
261+
- Adjust collector memory limits
262+
- Configure batch processing in the collector
263+
264+
#### Missing Node.js Traces
265+
266+
- Check Node.js service logs
267+
- Verify `NODE_OPTIONS` is set correctly
268+
- Ensure OpenTelemetry Node.js packages are installed
269+
270+
### Debugging Commands
271+
272+
```bash
273+
docker-compose ps
274+
docker-compose logs lowcoder
275+
docker-compose logs otel-collector
276+
curl -v http://localhost:4318/v1/traces
277+
curl http://localhost:16686/api/services
278+
```
279+
280+
### Log Analysis
281+
282+
Enable debug logging:
283+
284+
```yaml
285+
environment:
286+
- OTEL_LOG_LEVEL=debug
287+
- OTEL_JAVAAGENT_DEBUG=true
288+
```
289+
290+
---
291+
292+
## Production Considerations
293+
294+
### Security
295+
296+
- Use TLS for collector communication
297+
- Store API keys and credentials securely
298+
- Restrict collector network access
299+
- Sanitize sensitive data in telemetry
300+
301+
### Performance
302+
303+
- Use appropriate sampling rates
304+
- Set resource limits for all services
305+
- Configure efficient batch sizes in the collector
306+
- Plan for trace and metrics retention
307+
308+
### Scaling
309+
310+
- Deploy multiple collector instances for high throughput
311+
- Ensure observability backends can handle data volume
312+
- Use load balancers for collector endpoints
313+
- Consider sharding for large deployments
314+
315+
### Example Production Configuration
316+
317+
```yaml
318+
# docker-compose.prod.yml
319+
version: '3'
320+
services:
321+
lowcoder-api-service:
322+
environment:
323+
- OTEL_TRACES_SAMPLER=traceidratio
324+
- OTEL_TRACES_SAMPLER_ARG=0.01 # 1% sampling
325+
- OTEL_EXPORTER_OTLP_ENDPOINT=https://your-collector.company.com:4317
326+
- OTEL_SERVICE_VERSION="2.6.5"
327+
- OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production
328+
```
329+
330+
### Monitoring the Monitoring
331+
332+
Set up alerts for:
333+
334+
- Collector health and availability
335+
- High error rates in telemetry processing
336+
- Backend storage capacity
337+
- Unusual trace volume patterns
338+
339+
**Example Prometheus Alert:**
340+
```yaml
341+
groups:
342+
- name: otel-collector
343+
rules:
344+
- alert: CollectorDown
345+
expr: up{job="opentelemetry-collector"} == 0
346+
for: 5m
347+
annotations:
348+
summary: "OpenTelemetry Collector is down"
349+
```
350+
351+
---
352+
353+
## Support
354+
355+
- Review the [troubleshooting section](#troubleshooting)
356+
- Consult OpenTelemetry documentation
357+
- Submit issues to the project repository
358+
- Join the OpenTelemetry community discussions
359+
360+
---
361+
362+
## Contributing
363+
364+
- Test changes locally with the full stack
365+
- Document any new environment variables
366+
- Update this README with configuration changes
367+
- Ensure backward compatibility where possible
368+
369+
---

0 commit comments

Comments
 (0)