Add tracing entry span with W3C propagation to EPP handler#2057
Add tracing entry span with W3C propagation to EPP handler#2057k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
3843677 to
ee6df62
Compare
Signed-off-by: sallyom <somalley@redhat.com>
| // Inject trace context headers for propagation to downstream services | ||
| traceHeaders := make(map[string]string) | ||
| propagator := otel.GetTextMapPropagator() | ||
| propagator.Inject(ctx, propagation.MapCarrier(traceHeaders)) | ||
| for key, value := range traceHeaders { | ||
| headers = append(headers, &configPb.HeaderValueOption{ | ||
| Header: &configPb.HeaderValue{ | ||
| Key: key, | ||
| RawValue: []byte(value), | ||
| }, | ||
| }) | ||
| } | ||
|
|
There was a problem hiding this comment.
I think this should only be done if the user requested tracing. I think we need to add either a command line argument to enable tracing or to add something in the EPP Configuration.
There was a problem hiding this comment.
you shouldn't need to manually propagate context like this, as long as the go context.Context is correctly passed around then the otel sdk will handle propagation for you
There was a problem hiding this comment.
thanks, @damemi! I wasn't sure about this, I will remove this and retest to be sure. TY again!
There was a problem hiding this comment.
I'll remove the manual propagation, then will verify with llm-d:
- Does vllm:llm_request span show up as a child of gateway.request?
- Does the trace ID remain consistent end-to-end?
- If there's an upstream traceparent, is it continued correctly?
There was a problem hiding this comment.
The entry point of request handling is: https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/handlers/server.go#L128C49-L128C80
Where the context in Go is wrapped in the srv extProcPb.ExternalProcessor_ProcessServer. Does OTel need the context to be explicitly defined in function interface?
ref - https://pkg.go.dev/google.golang.org/grpc#ServerStream
There was a problem hiding this comment.
I did some testing with the context propagation - it seems with GAIE's architecture we need to manually propagate the trace headers. With GAIE's architecture as an Envoy External Processor it doesn't make HTTP requests directly. Without manual propagation, trace context doesn't reach downstream services. I have confirmed this with some testing. Without the manual trace propagation we see separate spans for gateway-api-inference-extension and vllm services, not the vllm child span with the propagated context headers. I'll leave the manual propagation in.
There was a problem hiding this comment.
@sallyom ah that's interesting, I didn't think about how this was working with envoy so there could be some work you need to do there. Not something I've worked with before but testing tells the truth
|
|
||
| // Start tracing span for the request | ||
| tracer := otel.Tracer("gateway-api-inference-extension") | ||
| ctx, span := tracer.Start(ctx, "gateway.request", trace.WithSpanKind(trace.SpanKindServer)) | ||
| defer span.End() | ||
|
|
There was a problem hiding this comment.
I think this should only be done if the user requested tracing. I think we need to add either a command line argument to enable tracing or to add something in the EPP Configuration.
There was a problem hiding this comment.
these calls are a zero-overhead no-op unless a TracerProvider is configured. So, all you should need to gate on the user enabling is the creation of the TracerProvider itself.
For reference, this is the same way that Kubernetes components implement tracing. They actually set up a no-op tracerprovider, but having no TracerProvider configured should be effectively the same.
Either way, it's not about feature gating the tracer.Start() calls, it's about the tracerprovider
There was a problem hiding this comment.
thanks, @damemi! I'll leave as/is but still open to other opinions
There was a problem hiding this comment.
Currently the trace initialization is only invoked if the tracing is enabled:
If InitTracing is not invoked, a default noop provider will be used (Correct me if I was wrong here). So it should be fine to keep it the way the PR implements.
|
lgtm, can any of approver help review it as well? Thanks! |
|
/approve Excited to have E2E tracing, thanks all! Will leave to reviewers for final stamp. |
damemi
left a comment
There was a problem hiding this comment.
I don't know enough about the envoy handling to say for sure, but it could be worth a todo to look into the manual context propagation. Otherwise lgtm!
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: damemi, JeffLuoo, kfswain, sallyom The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
…s-sigs#2057) Signed-off-by: sallyom <somalley@redhat.com>
…s-sigs#2057) Signed-off-by: sallyom <somalley@redhat.com>
|
/milestone v1.4 Not sure I have permissions though |
|
@Gregory-Pereira: You must be a member of the kubernetes-sigs/gateway-api-inference-extension-milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Inference Gateway Milestone Maintainers and have them propose you as an additional delegate for this responsibility. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |



What type of PR is this?
/kind feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Add tracing entry span with W3C propagation to EPP handler
See #1520
Does this PR introduce a user-facing change?: