Add tracing entry span with W3C propagation to EPP handler by sallyom · Pull Request #2057 · kubernetes-sigs/gateway-api-inference-extension

sallyom · 2026-01-05T19:09:24Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Add tracing entry span with W3C propagation to EPP handler
See #1520

Does this PR introduce a user-facing change?:

EPP request handler now includes distributed tracing entry span. When enabled via the existing --tracing flag, trace spans are created and W3C trace context is propagated to downstream services, enabling end-to-end request tracing. Tracing remains opt-in with no breaking changes introduced.

netlify · 2026-01-05T19:09:31Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`deba8b2`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/695d4f70b8751f00089314f9
😎 Deploy Preview	https://deploy-preview-2057--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: sallyom <somalley@redhat.com>

sallyom · 2026-01-07T13:20:08Z

For example, in llm-d, with the GAIE entry span & propagation, a trace looks like:

and drilldown see the GAIE plugins & vLLM end-to-end trace (with other llm-d components instrumented):

Without this PR, without the entry span & propagation but with tracing enabled in GAIE, spans in individual components aren't connected:

shmuelk · 2026-01-07T15:31:46Z

pkg/epp/handlers/request.go

+	// Inject trace context headers for propagation to downstream services
+	traceHeaders := make(map[string]string)
+	propagator := otel.GetTextMapPropagator()
+	propagator.Inject(ctx, propagation.MapCarrier(traceHeaders))
+	for key, value := range traceHeaders {
+		headers = append(headers, &configPb.HeaderValueOption{
+			Header: &configPb.HeaderValue{
+				Key:      key,
+				RawValue: []byte(value),
+			},
+		})
+	}
+


I think this should only be done if the user requested tracing. I think we need to add either a command line argument to enable tracing or to add something in the EPP Configuration.

you shouldn't need to manually propagate context like this, as long as the go context.Context is correctly passed around then the otel sdk will handle propagation for you

thanks, @damemi! I wasn't sure about this, I will remove this and retest to be sure. TY again!

I'll remove the manual propagation, then will verify with llm-d:

Does vllm:llm_request span show up as a child of gateway.request?

Does the trace ID remain consistent end-to-end?

If there's an upstream traceparent, is it continued correctly?

The entry point of request handling is: https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/epp/handlers/server.go#L128C49-L128C80

Where the context in Go is wrapped in the srv extProcPb.ExternalProcessor_ProcessServer. Does OTel need the context to be explicitly defined in function interface?

ref - https://pkg.go.dev/google.golang.org/grpc#ServerStream

I did some testing with the context propagation - it seems with GAIE's architecture we need to manually propagate the trace headers. With GAIE's architecture as an Envoy External Processor it doesn't make HTTP requests directly. Without manual propagation, trace context doesn't reach downstream services. I have confirmed this with some testing. Without the manual trace propagation we see separate spans for gateway-api-inference-extension and vllm services, not the vllm child span with the propagated context headers. I'll leave the manual propagation in.

@sallyom ah that's interesting, I didn't think about how this was working with envoy so there could be some work you need to do there. Not something I've worked with before but testing tells the truth

shmuelk · 2026-01-07T15:32:39Z

pkg/epp/handlers/server.go

+
+	// Start tracing span for the request
+	tracer := otel.Tracer("gateway-api-inference-extension")
+	ctx, span := tracer.Start(ctx, "gateway.request", trace.WithSpanKind(trace.SpanKindServer))
+	defer span.End()
+


I think this should only be done if the user requested tracing. I think we need to add either a command line argument to enable tracing or to add something in the EPP Configuration.

these calls are a zero-overhead no-op unless a TracerProvider is configured. So, all you should need to gate on the user enabling is the creation of the TracerProvider itself.

For reference, this is the same way that Kubernetes components implement tracing. They actually set up a no-op tracerprovider, but having no TracerProvider configured should be effectively the same.

Either way, it's not about feature gating the tracer.Start() calls, it's about the tracerprovider

thanks, @damemi! I'll leave as/is but still open to other opinions

Currently the trace initialization is only invoked if the tracing is enabled:

https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/common/telemetry.go#L46

https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/runner.go#L184

If InitTracing is not invoked, a default noop provider will be used (Correct me if I was wrong here). So it should be fine to keep it the way the PR implements.

JeffLuoo · 2026-01-13T20:09:28Z

lgtm, can any of approver help review it as well? Thanks!

cc: @nirrozenbaum @kfswain

kfswain · 2026-01-13T20:13:37Z

/approve

Excited to have E2E tracing, thanks all! Will leave to reviewers for final stamp.

damemi

I don't know enough about the envoy handling to say for sure, but it could be worth a todo to look into the manual context propagation. Otherwise lgtm!

k8s-ci-robot · 2026-01-13T20:41:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damemi, JeffLuoo, kfswain, sallyom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [kfswain]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kfswain · 2026-01-16T18:14:47Z

/lgtm

…s-sigs#2057) Signed-off-by: sallyom <somalley@redhat.com>

Gregory-Pereira · 2026-02-04T19:44:47Z

/milestone v1.4

Not sure I have permissions though

k8s-ci-robot · 2026-02-04T19:44:49Z

@Gregory-Pereira: You must be a member of the kubernetes-sigs/gateway-api-inference-extension-milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Inference Gateway Milestone Maintainers and have them propose you as an additional delegate for this responsibility.

Details

In response to this:

/milestone v1.4

Not sure I have permissions though

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 5, 2026

k8s-ci-robot requested review from elevran and nirrozenbaum January 5, 2026 19:09

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 5, 2026

sallyom marked this pull request as draft January 5, 2026 19:18

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2026

sallyom force-pushed the tracing-spans branch 2 times, most recently from 3843677 to ee6df62 Compare January 5, 2026 19:37

sallyom marked this pull request as ready for review January 6, 2026 17:44

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 6, 2026

k8s-ci-robot requested review from ahg-g and robscott January 6, 2026 17:44

Add tracing entry span with W3C propagation to EPP handler

deba8b2

Signed-off-by: sallyom <somalley@redhat.com>

sallyom force-pushed the tracing-spans branch from ee6df62 to deba8b2 Compare January 6, 2026 18:07

sallyom mentioned this pull request Jan 7, 2026

Add otel tracing instrumentation llm-d/llm-d-inference-scheduler#506

Merged

shmuelk suggested changes Jan 7, 2026

View reviewed changes

JeffLuoo approved these changes Jan 13, 2026

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 13, 2026

damemi approved these changes Jan 13, 2026

View reviewed changes

k8s-ci-robot assigned kfswain Jan 16, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 16, 2026

k8s-ci-robot merged commit 812efb2 into kubernetes-sigs:main Jan 16, 2026
12 checks passed

RyanRosario pushed a commit to RyanRosario/gateway-api-inference-extension that referenced this pull request Jan 20, 2026

Add tracing entry span with W3C propagation to EPP handler (kubernete…

055a928

…s-sigs#2057) Signed-off-by: sallyom <somalley@redhat.com>

This was referenced Jan 21, 2026

[Roadmap] llm-d v0.5.0 Roadmap llm-d/llm-d#517

Closed

distributed tracing proposal llm-d/llm-d#119

Merged

sallyom added a commit to sallyom/gateway-api-inference-extension that referenced this pull request Jan 25, 2026

Add tracing entry span with W3C propagation to EPP handler (kubernete…

93e6eb3

…s-sigs#2057) Signed-off-by: sallyom <somalley@redhat.com>

sallyom mentioned this pull request Jan 26, 2026

Initial distributed tracing instrumentation llm-d/llm-d-kv-cache#48

Merged

kfswain added this to the v1.4 milestone Feb 5, 2026

Conversation

sallyom commented Jan 5, 2026

Uh oh!

netlify bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

sallyom commented Jan 7, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JeffLuoo commented Jan 13, 2026

Uh oh!

kfswain commented Jan 13, 2026

Uh oh!

damemi left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Jan 13, 2026

Uh oh!

kfswain commented Jan 16, 2026

Uh oh!

Uh oh!

Gregory-Pereira commented Feb 4, 2026

Uh oh!

k8s-ci-robot commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

netlify bot commented Jan 5, 2026 •

edited

Loading