-
Notifications
You must be signed in to change notification settings - Fork 901
Description
Which component is this feature for?
Traceloop SDK
🔖 Feature description
Add built-in support for W3C TraceContext propagation across agent service boundaries, enabling unified distributed traces in multi-agent (A2A) architectures where independent agent services communicate over HTTP.
Current state: traceloop-sdk provides excellent in-process tracing via decorators (@workflow, @task, @agent), but trace context is confined to the local process via Python ContextVar. When Agent A calls Agent B over HTTP, the trace breaks -- each agent produces an isolated trace with its own trace_id.
Proposed additions:
- Configure W3C propagators during
Traceloop.init():
from opentelemetry.propagate import set_global_textmap
from opentelemetry.propagators.composite import CompositeTextMapPropagator
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
from opentelemetry.baggage.propagation import W3CBaggagePropagator
# During Traceloop.init(), after TracerProvider is set:
set_global_textmap(CompositeTextMapPropagator([
TraceContextTextMapPropagator(),
W3CBaggagePropagator(),
]))- Provide helper utilities for inject/extract:
from traceloop.sdk.propagation import inject_trace_context, extract_trace_context
# Calling agent -- inject into outgoing HTTP headers
@task(name="call-remote-agent")
async def call_remote_agent(payload: dict) -> dict:
headers = inject_trace_context() # Returns {"traceparent": "00-...", "tracestate": "..."}
response = await httpx.post("http://agent-b:8000/process", headers=headers, json=payload)
return response.json()
# Receiving agent -- extract from incoming HTTP headers
from traceloop.sdk.propagation.middleware import TraceContextMiddleware
app = FastAPI()
app.add_middleware(TraceContextMiddleware) # Auto-extracts traceparent from requests- Result -- unified trace across agents:
Trace [abc123] -- single trace_id across 2 services
orchestrate (Agent A)
+-- call-remote-agent (Agent A, CLIENT)
+-- POST /process (Agent B, SERVER) <-- remote parent
+-- analyze (Agent B)
+-- llm-call (Agent B)
🎤 Why is this feature needed?
The A2A (Agent-to-Agent) protocol is gaining adoption for multi-agent GenAI architectures where specialized agents communicate over HTTP. The A2A spec explicitly recommends W3C TraceContext for distributed tracing across agent boundaries.
The problem today:
traceloop-sdktraces are process-local. When Agent A calls Agent B, two disconnected traces are created.- Users cannot see end-to-end latency, token usage, or cost across a multi-agent workflow in their observability platform.
- Every team building multi-agent systems has to implement W3C TraceContext propagation themselves on top of traceloop-sdk.
Real-world impact:
In production multi-agent deployments exporting to platforms like Langfuse, Instana, Datadog, or Jaeger, users need:
- A single trace tree showing the full orchestration flow across all agents
- Cross-agent latency breakdown (which agent is the bottleneck?)
- Aggregated token usage and cost per end-to-end request
- Service dependency graphs (which agents call which?)
All of this works automatically once trace context propagates via traceparent headers -- the observability backends already support it. The missing piece is that traceloop-sdk doesn't configure the W3C propagators or provide helpers for cross-service context injection/extraction.
Note: This builds on top of the multi-exporter capability discussed in #3478. With multi-export + A2A propagation, users get unified cross-agent traces in multiple observability platforms simultaneously.
✌️ How do you aim to achieve this?
Based on investigation of the traceloop-sdk internals and OpenTelemetry Python SDK:
Step 1: Configure global propagators in Traceloop.init()
After trace.set_tracer_provider() is called, add:
set_global_textmap(CompositeTextMapPropagator([
TraceContextTextMapPropagator(),
W3CBaggagePropagator(),
]))This is a one-line addition. Both TraceContextTextMapPropagator and W3CBaggagePropagator are already bundled with opentelemetry-api (a dependency of traceloop-sdk). No new dependencies required.
Step 2: Provide propagation helper utilities
# traceloop/sdk/propagation/__init__.py
from opentelemetry import context, propagate
def inject_trace_context(carrier=None):
"""Inject current trace context into HTTP headers."""
if carrier is None:
carrier = {}
propagate.inject(carrier)
return carrier
def extract_trace_context(carrier):
"""Extract trace context from incoming HTTP headers."""
return propagate.extract(carrier)Step 3: Provide optional ASGI middleware
# traceloop/sdk/propagation/middleware.py
from opentelemetry import context, propagate, trace
class TraceContextMiddleware:
"""ASGI middleware for automatic W3C TraceContext extraction."""
def __init__(self, app):
self.app = app
async def __call__(self, scope, receive, send):
if scope["type"] != "http":
await self.app(scope, receive, send)
return
headers = {
k.decode(): v.decode()
for k, v in scope.get("headers", [])
}
remote_ctx = propagate.extract(headers)
token = context.attach(remote_ctx)
try:
tracer = trace.get_tracer("traceloop.sdk")
with tracer.start_as_current_span(
f"{scope.get('method', '')} {scope.get('path', '')}",
kind=trace.SpanKind.SERVER,
):
await self.app(scope, receive, send)
finally:
context.detach(token)Key design decisions:
- Zero new dependencies (uses OTel APIs already in the dependency tree)
- Fully backward compatible (propagation helpers are opt-in)
- Follows W3C standards (not proprietary headers)
- Works with any OTLP-compatible backend (Langfuse, Instana, Datadog, Jaeger, etc.)
- Propagator configuration in
init()is transparent -- existing single-service users see no change
🔄️ Additional Information
Performance impact: Negligible. propagate.inject() adds ~0.01ms and ~200 bytes per outgoing call. propagate.extract() adds ~0.02ms per incoming request. This is insignificant compared to HTTP round-trip latency (1-100ms) and LLM API call latency (200-5000ms).
Async context considerations: OTel context flows correctly through await and asyncio.gather(). It does NOT flow through asyncio.create_task() -- this is a known Python/OTel limitation that should be documented.
Alternative approaches considered:
- Custom
X-Traceloop-*headers: Rejected -- proprietary, not recognized by observability backends - Requiring users to configure propagators themselves: Current state, but every multi-agent team has to figure this out independently
References:
- W3C Trace Context Specification
- W3C Baggage Specification
- A2A Protocol - Enterprise Features
- OpenTelemetry Context Propagation
- Related: 🚀 Feature: Support for Multiple OTLP Endpoints/Exporters #3478 (Multiple OTLP Endpoints/Exporters)
👀 Have you spent some time to check if this feature request has been raised before?
- I checked and didn't find similar issue
Are you willing to submit PR?
Yes I am willing to submit a PR!