Fix: Downgrade NTPOT metrics error to verbose info to reduce noise#2435
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @ycjiang50. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
If this is somewhat expected behavior, I wonder if we should even log this case. I noticed this while running and inference-perf too. |
|
|
||
| if outputTokenCount <= 0 { | ||
| log.FromContext(ctx).Error(nil, "Output token count must be positive for NTPOT calculation", | ||
| log.FromContext(ctx).V(logutil.VERBOSE).Info("Output token count must be positive for NTPOT calculation", |
There was a problem hiding this comment.
+1 to luke's comment, lets return early at the beginning of the func if the value is 0 or less
There was a problem hiding this comment.
Thanks. I've updated the pr.
d4ec18b to
74bc5f9
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, ycjiang50 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/ok-to-test |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR downgrades the log level of the error "Output token count must be positive for NTPOT calculation" from
ErrortoV(logutil.VERBOSE).Info.Currently, when a client performs a streaming request without explicitly setting
stream_options: {"include_usage": true}, the vLLM backend (and likely others) does not return usage statistics (token counts) in the final chunk.The EPP relies on

Usage.CompletionTokensto calculate the Normalized Time Per Output Token (NTPOT) metric. When this usage data is missing (0 tokens), EPP logs an error for every single request, causing significant log noise. For example, when I was using inference-perf to run some benchmark, which doesn't specify stream_options, the epp logs were completely overwhelmed by this error, making it hard to see any other useful output.Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?: