From 3f456ad3c6f384a391c90a00ed8ebdbcfda22535 Mon Sep 17 00:00:00 2001 From: Cyrille Le Clerc Date: Wed, 9 Jul 2025 23:02:37 +0200 Subject: [PATCH 1/2] Add "Troubleshooting Pod Log and Trace Correlation Issues" entry --- charts/k8s-monitoring/docs/Troubleshooting.md | 38 +++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/charts/k8s-monitoring/docs/Troubleshooting.md b/charts/k8s-monitoring/docs/Troubleshooting.md index 6e95c67203..811af45b21 100644 --- a/charts/k8s-monitoring/docs/Troubleshooting.md +++ b/charts/k8s-monitoring/docs/Troubleshooting.md @@ -92,3 +92,41 @@ receivers: ``` Start with 2000 and adjust as needed. + + +### Troubleshooting Pod Log and Trace Correlation Issues + +**Problem:** You're experiencing issues correlating Kubernetes pod logs with OpenTelemetry traces, metrics, and application logs. This often occurs when the `service.name`, `service.namespace`, and `service.instance.id` metadata do not consistently match across all your telemetry signals. + +**Solution:** To ensure proper correlation, we recommend aligning your metadata according to OpenTelemetry specifications, particularly the "[Specify resource attributes using Kubernetes annotations](https://opentelemetry.io/docs/specs/semconv/non-normative/k8s-attributes/)" guide. Follow these steps: + +1. **Define `service.name` and `service.namespace` consistently:** Prioritize these methods in order of preference: + + * **Kubernetes Pod Annotations:** Use `resource.opentelemetry.io/service.name` and `resource.opentelemetry.io/service.namespace` on your pods. + * **Kubernetes Pod Label & Namespace Name:** Utilize the `app.kubernetes.io/name` pod label and the Kubernetes namespace name. + * **Kubernetes Deployment & Namespace Names:** Infer from your Kubernetes deployment and namespace names. + * For more options, refer to the "[Specify resource attributes using Kubernetes annotations](https://opentelemetry.io/docs/specs/semconv/non-normative/k8s-attributes/)" guide. + +2. **Derive `service.instance.id`:** Infer `service.instance.id` from Kubernetes namespace, pod, and container names using the format: `concat([k8s.namespace.name, k8s.pod.name, k8s.container.name], '.')`. See "OpenTelemetry Operator" recommendation below. + +3. **Inject Resource Attributes into Workloads:** Pass these `service.name`, `service.namespace`, and `service.instance.id` resource attributes to your containerized workloads' OpenTelemetry instrumentation. This is typically done by injecting them as environment variables: + + * `OTEL_SERVICE_NAME` + * `OTEL_RESOURCE_ATTRIBUTES` + (Refer to OpenTelemetry SDK Environment Variables for more details.) + + **Recommended Approach:** Use the [OpenTelemetry Operator](https://github.com/open-telemetry/opentelemetry-operator) for automatic injection. Simply add the `instrumentation.opentelemetry.io/inject-sdk: "true"` pod annotation. The OTel Operator integrates seamlessly with Grafana Kubernetes Monitoring by automatically configuring the exporter endpoint to `http://grafana-k8s-monitoring-alloy-receiver.default.svc.cluster.local:4318` within its `Instrumentation` CRD. + + **Alternative:** Manually specify the `OTEL_*` environment variables directly in your Kubernetes deployment manifests, including deriving `service.instance.id` from Kubernetes metadata (`concat([k8s.namespace.name, k8s.pod.name, k8s.container.name], '.')`). + +4. **Configure Pod Log Collection:** Ensure your Grafana Kubernetes Monitoring Helm Chart is configured to collect pod logs using the `filelog` gather method ([docs](https://github.com/grafana/k8s-monitoring-helm/tree/main/charts/k8s-monitoring/charts/feature-pod-logs)). + Add the following to your `values.yml`: + + ```yaml + podLogs: + enabled: true + gatherMethod: filelog + ``` + +5. **Verify Correlation:** After applying these configurations, verify the successful correlation of your pod logs with application traces, metrics, and other logs in Grafana Application Observability, Grafana Explore, or through Grafana Drilldown features. + From a2739c57d37c9eb9897af8ec6b538fcfaa985d88 Mon Sep 17 00:00:00 2001 From: Cyrille Le Clerc Date: Wed, 9 Jul 2025 23:15:03 +0200 Subject: [PATCH 2/2] markdownlint --- charts/k8s-monitoring/docs/Troubleshooting.md | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/charts/k8s-monitoring/docs/Troubleshooting.md b/charts/k8s-monitoring/docs/Troubleshooting.md index 811af45b21..21f06e54c7 100644 --- a/charts/k8s-monitoring/docs/Troubleshooting.md +++ b/charts/k8s-monitoring/docs/Troubleshooting.md @@ -93,7 +93,6 @@ receivers: Start with 2000 and adjust as needed. - ### Troubleshooting Pod Log and Trace Correlation Issues **Problem:** You're experiencing issues correlating Kubernetes pod logs with OpenTelemetry traces, metrics, and application logs. This often occurs when the `service.name`, `service.namespace`, and `service.instance.id` metadata do not consistently match across all your telemetry signals. @@ -102,17 +101,17 @@ Start with 2000 and adjust as needed. 1. **Define `service.name` and `service.namespace` consistently:** Prioritize these methods in order of preference: - * **Kubernetes Pod Annotations:** Use `resource.opentelemetry.io/service.name` and `resource.opentelemetry.io/service.namespace` on your pods. - * **Kubernetes Pod Label & Namespace Name:** Utilize the `app.kubernetes.io/name` pod label and the Kubernetes namespace name. - * **Kubernetes Deployment & Namespace Names:** Infer from your Kubernetes deployment and namespace names. - * For more options, refer to the "[Specify resource attributes using Kubernetes annotations](https://opentelemetry.io/docs/specs/semconv/non-normative/k8s-attributes/)" guide. + - **Kubernetes Pod Annotations:** Use `resource.opentelemetry.io/service.name` and `resource.opentelemetry.io/service.namespace` on your pods. + - **Kubernetes Pod Label & Namespace Name:** Utilize the `app.kubernetes.io/name` pod label and the Kubernetes namespace name. + - **Kubernetes Deployment & Namespace Names:** Infer from your Kubernetes deployment and namespace names. + - For more options, refer to the "[Specify resource attributes using Kubernetes annotations](https://opentelemetry.io/docs/specs/semconv/non-normative/k8s-attributes/)" guide. 2. **Derive `service.instance.id`:** Infer `service.instance.id` from Kubernetes namespace, pod, and container names using the format: `concat([k8s.namespace.name, k8s.pod.name, k8s.container.name], '.')`. See "OpenTelemetry Operator" recommendation below. 3. **Inject Resource Attributes into Workloads:** Pass these `service.name`, `service.namespace`, and `service.instance.id` resource attributes to your containerized workloads' OpenTelemetry instrumentation. This is typically done by injecting them as environment variables: - * `OTEL_SERVICE_NAME` - * `OTEL_RESOURCE_ATTRIBUTES` + - `OTEL_SERVICE_NAME` + - `OTEL_RESOURCE_ATTRIBUTES` (Refer to OpenTelemetry SDK Environment Variables for more details.) **Recommended Approach:** Use the [OpenTelemetry Operator](https://github.com/open-telemetry/opentelemetry-operator) for automatic injection. Simply add the `instrumentation.opentelemetry.io/inject-sdk: "true"` pod annotation. The OTel Operator integrates seamlessly with Grafana Kubernetes Monitoring by automatically configuring the exporter endpoint to `http://grafana-k8s-monitoring-alloy-receiver.default.svc.cluster.local:4318` within its `Instrumentation` CRD. @@ -129,4 +128,3 @@ Start with 2000 and adjust as needed. ``` 5. **Verify Correlation:** After applying these configurations, verify the successful correlation of your pod logs with application traces, metrics, and other logs in Grafana Application Observability, Grafana Explore, or through Grafana Drilldown features. -