|
| 1 | +# OpenTelemetry Integration for Lowcoder |
| 2 | + |
| 3 | +This document provides comprehensive instructions for enabling, configuring, and verifying OpenTelemetry tracing and metrics for the Lowcoder application, which includes both Java backend services and Node.js components. OpenTelemetry enables unified observability, distributed tracing, and metrics collection across your stack, supporting integration with Tempo, Prometheus, Grafana, and other observability backends. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Table of Contents |
| 8 | + |
| 9 | +- [Overview](#overview) |
| 10 | +- [Architecture](#architecture) |
| 11 | +- [Prerequisites](#prerequisites) |
| 12 | +- [Quick Start](#quick-start) |
| 13 | +- [Configuration](#configuration) |
| 14 | + - [Common Environment Variables](#common-environment-variables) |
| 15 | + - [Java API Service](#java-api-service) |
| 16 | + - [Node.js Service](#nodejs-service) |
| 17 | +- [Monitoring and Visualization](#monitoring-and-visualization) |
| 18 | + - [Distributed Tracing (Tempo + Grafana)](#distributed-tracing-tempo--grafana) |
| 19 | + - [Grafana](#grafana-dashboards) |
| 20 | + - [Prometheus](#prometheus-metrics-collection) |
| 21 | +- [Advanced Configuration](#advanced-configuration) |
| 22 | +- [Troubleshooting](#troubleshooting) |
| 23 | +- [Production Considerations](#production-considerations) |
| 24 | +- [Support](#support) |
| 25 | +- [Contributing](#contributing) |
| 26 | + |
| 27 | +--- |
| 28 | + |
| 29 | +## Overview |
| 30 | + |
| 31 | +Lowcoder leverages OpenTelemetry auto-instrumentation for both Java and Node.js services, providing: |
| 32 | + |
| 33 | +- **Distributed Tracing:** End-to-end visibility of requests across services. |
| 34 | +- **Metrics Collection:** Performance and health metrics for all components. |
| 35 | +- **Flexible Export:** Support for Tempo, Prometheus, Grafana, Datadog, New Relic, and more. |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +## Architecture |
| 40 | + |
| 41 | +``` |
| 42 | +┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ |
| 43 | +│ Lowcoder │ │ OpenTelemetry │ │ Observability │ |
| 44 | +│ Application │───▶│ Collector │──▶│ Backends │ |
| 45 | +│ (Java + Node.js)│ │ │ │ (Jaeger/Grafana)│ |
| 46 | +└─────────────────┘ └──────────────────┘ └─────────────────┘ |
| 47 | +``` |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## Prerequisites |
| 52 | + |
| 53 | +- **Docker** and **Docker Compose** |
| 54 | +- At least **4GB RAM** for the full observability stack |
| 55 | +- An OpenTelemetry Collector instance (included in docker-compose) |
| 56 | +- Access to Jaeger, Prometheus, and Grafana (included in docker-compose) |
| 57 | + |
| 58 | +--- |
| 59 | + |
| 60 | +## Quick Start |
| 61 | + |
| 62 | +### 1. Clone and Build |
| 63 | + |
| 64 | +```bash |
| 65 | +git clone https://github.com/lowcoder-org/lowcoder.git |
| 66 | +cd lowcoder/deploy/docker |
| 67 | +docker-compose -f ./docker-compose-multi-otel.yaml up -d |
| 68 | +``` |
| 69 | + |
| 70 | +### 2. Access Services |
| 71 | + |
| 72 | +- **Lowcoder Application:** http://localhost:8080 |
| 73 | +- **Jaeger UI (Traces):** http://localhost:16686 |
| 74 | +- **Grafana (Dashboards):** http://localhost:3000 (admin/admin) |
| 75 | +- **Prometheus (Metrics):** http://localhost:9090 |
| 76 | + |
| 77 | +### 3. Generate Traffic |
| 78 | + |
| 79 | +Use the Lowcoder app to generate telemetry data, then view traces and metrics in Jaeger, Grafana, and Prometheus. |
| 80 | + |
| 81 | +--- |
| 82 | + |
| 83 | +## Configuration |
| 84 | + |
| 85 | +### Common Environment Variables |
| 86 | + |
| 87 | +| Variable | Description | Default/Example | |
| 88 | +|---------------------------------|------------------------------------|----------------------------------------| |
| 89 | +| `OTEL_SDK_DISABLED` | Disable all telemetry | `false` | |
| 90 | +| `OTEL_SERVICE_NAME` | Java service name | `lowcoder-java-backend` | |
| 91 | +| `OTEL_NODE_SERVICE_NAME` | Node.js service name | `lowcoder-node-service` | |
| 92 | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | Collector endpoint | `http://otel-collector:4317` | |
| 93 | +| `OTEL_RESOURCE_ATTRIBUTES` | Additional resource attributes | `deployment.environment=production` | |
| 94 | +| `OTEL_TRACES_EXPORTER` | Trace exporter | `otlp`, `jaeger`, `none` | |
| 95 | +| `OTEL_METRICS_EXPORTER` | Metrics exporter | `otlp`, `prometheus`, `none` | |
| 96 | +| `OTEL_LOGS_EXPORTER` | Logs exporter | `otlp`, `none` | |
| 97 | +| `OTEL_TRACES_SAMPLER` | Sampling strategy | `traceidratio` | |
| 98 | +| `OTEL_TRACES_SAMPLER_ARG` | Sampling parameter | `0.1` (10%) | |
| 99 | + |
| 100 | +#### Example (docker-compose.yml): |
| 101 | + |
| 102 | +```yaml |
| 103 | +environment: |
| 104 | + - OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317 |
| 105 | + - OTEL_SERVICE_NAME=lowcoder-java-backend |
| 106 | + - OTEL_NODE_SERVICE_NAME=lowcoder-node-service |
| 107 | + - OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production |
| 108 | + - OTEL_TRACES_SAMPLER=traceidratio |
| 109 | + - OTEL_TRACES_SAMPLER_ARG=0.1 |
| 110 | +``` |
| 111 | +
|
| 112 | +--- |
| 113 | +
|
| 114 | +### Java API Service |
| 115 | +
|
| 116 | +- Uses OpenTelemetry SDK with auto-instrumentation. |
| 117 | +- Service name and resource attributes are set via environment variables or configuration. |
| 118 | +- Traces and metrics are exported to the OpenTelemetry Collector. |
| 119 | +
|
| 120 | +**Custom Spans Example:** |
| 121 | +```java |
| 122 | +import io.opentelemetry.api.GlobalOpenTelemetry; |
| 123 | +import io.opentelemetry.api.trace.Tracer; |
| 124 | + |
| 125 | +Tracer tracer = GlobalOpenTelemetry.getTracer("my-service"); |
| 126 | +Span span = tracer.spanBuilder("custom-operation").startSpan(); |
| 127 | +try { |
| 128 | + // Your business logic here |
| 129 | +} finally { |
| 130 | + span.end(); |
| 131 | +} |
| 132 | +``` |
| 133 | + |
| 134 | +--- |
| 135 | + |
| 136 | +### Node.js Service |
| 137 | + |
| 138 | +- Uses `@opentelemetry/sdk-node` and `@opentelemetry/auto-instrumentations-node`. |
| 139 | +- Configuration is handled in `otel.config.js` and via environment variables. |
| 140 | +- Traces and metrics are exported to the OpenTelemetry Collector. |
| 141 | + |
| 142 | +**Custom Spans Example:** |
| 143 | +```javascript |
| 144 | +const { trace } = require('@opentelemetry/api'); |
| 145 | + |
| 146 | +const tracer = trace.getTracer('my-service'); |
| 147 | +const span = tracer.startSpan('custom-operation'); |
| 148 | +try { |
| 149 | + // Your business logic here |
| 150 | +} finally { |
| 151 | + span.end(); |
| 152 | +} |
| 153 | +``` |
| 154 | + |
| 155 | +--- |
| 156 | + |
| 157 | +## Monitoring and Visualization |
| 158 | + |
| 159 | +### Distributed Tracing (Tempo + Grafana) |
| 160 | + |
| 161 | +- **Trace Storage:** [Grafana Tempo](https://grafana.com/oss/tempo/) is used as the distributed tracing backend. |
| 162 | +- **Visualization:** Traces are visualized and explored via [Grafana](https://grafana.com/). |
| 163 | +- **URL:** http://localhost:3001 (Grafana, default login: admin/admin) |
| 164 | +- **Features:** |
| 165 | + - View end-to-end traces and span timelines |
| 166 | + - Analyze service dependencies and request flows |
| 167 | + - Debug errors and identify bottlenecks |
| 168 | + - Correlate traces with metrics and logs |
| 169 | + |
| 170 | +### Grafana (Dashboards) |
| 171 | + |
| 172 | +- **URL:** http://localhost:3001 (admin/admin) |
| 173 | +- Visualize metrics, create dashboards, set up alerts, and monitor application health. |
| 174 | +- Add Prometheus as a data source (`http://prometheus:9090`). |
| 175 | +- Explore traces via the Tempo data source. |
| 176 | + |
| 177 | +### Prometheus (Metrics Collection) |
| 178 | + |
| 179 | +- **URL:** http://localhost:9090 |
| 180 | +- Query raw metrics, set up recording and alerting rules, and monitor collector health. |
| 181 | + |
| 182 | +--- |
| 183 | + |
| 184 | +## Advanced Configuration |
| 185 | + |
| 186 | +### Customizing Service Names and Attributes |
| 187 | + |
| 188 | +```yaml |
| 189 | +environment: |
| 190 | + - OTEL_SERVICE_NAME=lowcoder-api-service |
| 191 | +``` |
| 192 | +
|
| 193 | +### Adjusting Sampling |
| 194 | +
|
| 195 | +```yaml |
| 196 | +environment: |
| 197 | + - OTEL_TRACES_SAMPLER=traceidratio |
| 198 | + - OTEL_TRACES_SAMPLER_ARG=0.05 # 5% sampling |
| 199 | +``` |
| 200 | +
|
| 201 | +### Custom Collector Configuration |
| 202 | +
|
| 203 | +Edit `otel-collector-config.yaml` to: |
| 204 | + |
| 205 | +- Add receivers (e.g., filelog) |
| 206 | +- Configure exporters (e.g., Jaeger, Datadog, New Relic) |
| 207 | +- Add processors and filtering rules |
| 208 | + |
| 209 | +**Example for log collection:** |
| 210 | +```yaml |
| 211 | +receivers: |
| 212 | + filelog: |
| 213 | + include: [/var/log/lowcoder/*.log] |
| 214 | + operators: |
| 215 | + - type: json_parser |
| 216 | +
|
| 217 | +service: |
| 218 | + pipelines: |
| 219 | + logs: |
| 220 | + receivers: [filelog, otlp] |
| 221 | + processors: [batch] |
| 222 | + exporters: [logging] |
| 223 | +``` |
| 224 | + |
| 225 | +### External Backend Integration |
| 226 | + |
| 227 | +**Datadog Example:** |
| 228 | +```yaml |
| 229 | +exporters: |
| 230 | + datadog: |
| 231 | + api: |
| 232 | + key: "${DD_API_KEY}" |
| 233 | + site: datadoghq.com |
| 234 | +``` |
| 235 | + |
| 236 | +**New Relic Example:** |
| 237 | +```yaml |
| 238 | +exporters: |
| 239 | + otlp: |
| 240 | + endpoint: https://otlp.nr-data.net:4317 |
| 241 | + headers: |
| 242 | + api-key: "${NEW_RELIC_LICENSE_KEY}" |
| 243 | +``` |
| 244 | + |
| 245 | +--- |
| 246 | + |
| 247 | +## Troubleshooting |
| 248 | + |
| 249 | +### Common Issues |
| 250 | + |
| 251 | +#### No Traces Appearing |
| 252 | + |
| 253 | +- Ensure the OpenTelemetry Collector is running (`docker-compose ps`) |
| 254 | +- Verify the collector endpoint (`curl http://localhost:4317`) |
| 255 | +- Check application logs for OTEL errors |
| 256 | +- Ensure `OTEL_SDK_DISABLED` is not set to `true` |
| 257 | + |
| 258 | +#### High Memory Usage |
| 259 | + |
| 260 | +- Reduce sampling rate (`OTEL_TRACES_SAMPLER_ARG=0.01`) |
| 261 | +- Adjust collector memory limits |
| 262 | +- Configure batch processing in the collector |
| 263 | + |
| 264 | +#### Missing Node.js Traces |
| 265 | + |
| 266 | +- Check Node.js service logs |
| 267 | +- Verify `NODE_OPTIONS` is set correctly |
| 268 | +- Ensure OpenTelemetry Node.js packages are installed |
| 269 | + |
| 270 | +### Debugging Commands |
| 271 | + |
| 272 | +```bash |
| 273 | +docker-compose ps |
| 274 | +docker-compose logs lowcoder |
| 275 | +docker-compose logs otel-collector |
| 276 | +curl -v http://localhost:4318/v1/traces |
| 277 | +curl http://localhost:16686/api/services |
| 278 | +``` |
| 279 | + |
| 280 | +### Log Analysis |
| 281 | + |
| 282 | +Enable debug logging: |
| 283 | + |
| 284 | +```yaml |
| 285 | +environment: |
| 286 | + - OTEL_LOG_LEVEL=debug |
| 287 | + - OTEL_JAVAAGENT_DEBUG=true |
| 288 | +``` |
| 289 | + |
| 290 | +--- |
| 291 | + |
| 292 | +## Production Considerations |
| 293 | + |
| 294 | +### Security |
| 295 | + |
| 296 | +- Use TLS for collector communication |
| 297 | +- Store API keys and credentials securely |
| 298 | +- Restrict collector network access |
| 299 | +- Sanitize sensitive data in telemetry |
| 300 | + |
| 301 | +### Performance |
| 302 | + |
| 303 | +- Use appropriate sampling rates |
| 304 | +- Set resource limits for all services |
| 305 | +- Configure efficient batch sizes in the collector |
| 306 | +- Plan for trace and metrics retention |
| 307 | + |
| 308 | +### Scaling |
| 309 | + |
| 310 | +- Deploy multiple collector instances for high throughput |
| 311 | +- Ensure observability backends can handle data volume |
| 312 | +- Use load balancers for collector endpoints |
| 313 | +- Consider sharding for large deployments |
| 314 | + |
| 315 | +### Example Production Configuration |
| 316 | + |
| 317 | +```yaml |
| 318 | +# docker-compose.prod.yml |
| 319 | +version: '3' |
| 320 | +services: |
| 321 | + lowcoder-api-service: |
| 322 | + environment: |
| 323 | + - OTEL_TRACES_SAMPLER=traceidratio |
| 324 | + - OTEL_TRACES_SAMPLER_ARG=0.01 # 1% sampling |
| 325 | + - OTEL_EXPORTER_OTLP_ENDPOINT=https://your-collector.company.com:4317 |
| 326 | + - OTEL_SERVICE_VERSION="2.6.5" |
| 327 | + - OTEL_RESOURCE_ATTRIBUTES=deployment.environment=production |
| 328 | +``` |
| 329 | + |
| 330 | +### Monitoring the Monitoring |
| 331 | + |
| 332 | +Set up alerts for: |
| 333 | + |
| 334 | +- Collector health and availability |
| 335 | +- High error rates in telemetry processing |
| 336 | +- Backend storage capacity |
| 337 | +- Unusual trace volume patterns |
| 338 | + |
| 339 | +**Example Prometheus Alert:** |
| 340 | +```yaml |
| 341 | +groups: |
| 342 | + - name: otel-collector |
| 343 | + rules: |
| 344 | + - alert: CollectorDown |
| 345 | + expr: up{job="opentelemetry-collector"} == 0 |
| 346 | + for: 5m |
| 347 | + annotations: |
| 348 | + summary: "OpenTelemetry Collector is down" |
| 349 | +``` |
| 350 | + |
| 351 | +--- |
| 352 | + |
| 353 | +## Support |
| 354 | + |
| 355 | +- Review the [troubleshooting section](#troubleshooting) |
| 356 | +- Consult OpenTelemetry documentation |
| 357 | +- Submit issues to the project repository |
| 358 | +- Join the OpenTelemetry community discussions |
| 359 | + |
| 360 | +--- |
| 361 | + |
| 362 | +## Contributing |
| 363 | + |
| 364 | +- Test changes locally with the full stack |
| 365 | +- Document any new environment variables |
| 366 | +- Update this README with configuration changes |
| 367 | +- Ensure backward compatibility where possible |
| 368 | + |
| 369 | +--- |
0 commit comments