Skip to content

Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc#1905

Merged
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
ezrasilvera:vllm-metric
Nov 26, 2025
Merged

Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc#1905
k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
ezrasilvera:vllm-metric

Conversation

@ezrasilvera
Copy link
Copy Markdown
Contributor

What type of PR is this?
/kind bug

What this PR does / why we need it:
The metric in vllm was changed from gpu_cache_usage_perc to kv_cache_usage_perc

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:


@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 26, 2025
@netlify
Copy link
Copy Markdown

netlify bot commented Nov 26, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 1c93035
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6926f7e2e66e7d0008551f62
😎 Deploy Preview https://deploy-preview-1905--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Nov 26, 2025

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: ezrasilvera / name: Ezra Silvera (1c93035)

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @ezrasilvera!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @ezrasilvera. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 26, 2025
@elevran
Copy link
Copy Markdown
Contributor

elevran commented Nov 26, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 26, 2025
@nirrozenbaum
Copy link
Copy Markdown
Contributor

in which vllm version was it changed?

@ezrasilvera
Copy link
Copy Markdown
Contributor Author

ezrasilvera commented Nov 26, 2025

in which vllm version was it changed?

It seems like a relatively old change. You can see here vllm-project/vllm#18354 and vllm-project/vllm#24245
BTW, I think that because the metrics are just set to 0 if they don't exist and there is no failure, in practice it's very hard to know whether a scorer actually worked or not, as usually this will impact mainly the performance and not the decoding result.

@nirrozenbaum
Copy link
Copy Markdown
Contributor

great catch @ezrasilvera.
what are your thoughts about adding some validation so we can catch this (at least in the logs) if it happens again?

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 26, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ezrasilvera, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2025
@nirrozenbaum
Copy link
Copy Markdown
Contributor

cc @liu-cong @kfswain

@k8s-ci-robot k8s-ci-robot merged commit cfdce59 into kubernetes-sigs:main Nov 26, 2025
12 checks passed
@liu-cong
Copy link
Copy Markdown
Contributor

Thank you @ezrasilvera for fixing this! Looks like this was changed in v0.10. And I confirmed that the latest llm-d image has the change.

I filed an issue in vllm to add conformance tests to prevent such breaking changes:
vllm-project/vllm#29508

@liu-cong
Copy link
Copy Markdown
Contributor

@nirrozenbaum Any chance we cherrypick this to the release?

@ezrasilvera
Copy link
Copy Markdown
Contributor Author

Thank you @ezrasilvera for fixing this! Looks like this was changed in v0.10. And I confirmed that the latest llm-d image has the change.

I filed an issue in vllm to add conformance tests to prevent such breaking changes: vllm-project/vllm#29508

@liu-cong @nirrozenbaum Unfortunately I don't think it's a vllm issue and I don't think tests on the vllm side can help avoiding such issues. They actually had grace period in which both metrics existed.
The main issue is that we don't have tests that validate the existing of what we consider as mandatory metrics. We also don't validate that the scorreres actually worked. For metrics we can think for example on adding an optional validation test that will be used only on major releases.

@liu-cong
Copy link
Copy Markdown
Contributor

Unfortunately I don't think it's a vllm issue and I don't think tests on the vllm side can help avoiding such issues.

I think there are values in both adding tests to vllm side and here. Adding a conformance test on vllm side allows us to get notified early on breaking changes so we can plan proactively. It also enables a feedback channel for vllm maintainers to understand downstream dependencies better.

In EPP, some conformance test that covers latest N vllm versions is a great idea. I think we can add a "preflight" check feature as well (we should allow it to be turned off).

@nirrozenbaum
Copy link
Copy Markdown
Contributor

@nirrozenbaum Any chance we cherrypick this to the release?

yes, theoretically we can, although this is configurable using a flag in helm install

@ezrasilvera
Copy link
Copy Markdown
Contributor Author

Unfortunately I don't think it's a vllm issue and I don't think tests on the vllm side can help avoiding such issues.

I think there are values in both adding tests to vllm side and here. Adding a conformance test on vllm side allows us to get notified early on breaking changes so we can plan proactively. It also enables a feedback channel for vllm maintainers to understand downstream dependencies better.

In EPP, some conformance test that covers latest N vllm versions is a great idea. I think we can add a "preflight" check feature as well (we should allow it to be turned off).

I fully agree with both comments 😀

@liu-cong
Copy link
Copy Markdown
Contributor

yes, theoretically we can, although this is configurable using a flag in helm install

@nirrozenbaum True, I am creating a PR to in llm-d to configure this correctly even without the patch release, just to be safe.

@ezrasilvera
Copy link
Copy Markdown
Contributor Author

@nirrozenbaum Any chance we cherrypick this to the release?

yes, theoretically we can, although this is configurable using a flag in helm install

Indeed. This is exactly what we did to validate that the new name is working

elevran pushed a commit to elevran/gateway-api-inference-extension that referenced this pull request Nov 27, 2025
Gregory-Pereira added a commit to Gregory-Pereira/kserve that referenced this pull request Apr 4, 2026
- IGW pr kubernetes-sigs/gateway-api-inference-extension#1905 landed fixing metrics name; no longer need to specify
- tokenizer now pulls in vLLM and thus Torch: need writeable paths for torch on tokenization
- tokenizer moves health endpoint from `/health` --> `/healthz`
- image bumps

Signed-off-by: greg pereira <grpereir@redhat.com>
Gregory-Pereira added a commit to Gregory-Pereira/kserve that referenced this pull request Apr 4, 2026
- IGW pr kubernetes-sigs/gateway-api-inference-extension#1905 landed fixing metrics name; no longer need to specify
- tokenizer now pulls in vLLM and thus Torch: need writeable paths for torch on tokenization
- tokenizer moves health endpoint from `/health` --> `/healthz`
- image bumps

Signed-off-by: greg pereira <grpereir@redhat.com>
Gregory-Pereira added a commit to Gregory-Pereira/kserve that referenced this pull request Apr 4, 2026
- IGW pr kubernetes-sigs/gateway-api-inference-extension#1905 landed fixing metrics name; no longer need to specify
- tokenizer now pulls in vLLM and thus Torch: need writeable paths for torch on tokenization
- tokenizer moves health endpoint from `/health` --> `/healthz`
- image bumps

Signed-off-by: greg pereira <grpereir@redhat.com>
Gregory-Pereira added a commit to Gregory-Pereira/kserve that referenced this pull request Apr 7, 2026
- IGW pr kubernetes-sigs/gateway-api-inference-extension#1905 landed fixing metrics name; no longer need to specify
- tokenizer now pulls in vLLM and thus Torch: need writeable paths for torch on tokenization
- tokenizer moves health endpoint from `/health` --> `/healthz`
- image bumps

Signed-off-by: greg pereira <grpereir@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants