Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc by ezrasilvera · Pull Request #1905 · kubernetes-sigs/gateway-api-inference-extension

ezrasilvera · 2025-11-26T12:51:42Z

What type of PR is this?
/kind bug

What this PR does / why we need it:
The metric in vllm was changed from gpu_cache_usage_perc to kv_cache_usage_perc

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

Signed-off-by: Ezra Silvera <ezra@il.ibm.com>

netlify · 2025-11-26T12:51:50Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`1c93035`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6926f7e2e66e7d0008551f62
😎 Deploy Preview	https://deploy-preview-1905--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

linux-foundation-easycla · 2025-11-26T12:51:51Z

The committers listed above are authorized under a signed CLA.

✅ login: ezrasilvera / name: Ezra Silvera (1c93035)

k8s-ci-robot · 2025-11-26T12:51:52Z

Welcome @ezrasilvera!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-11-26T12:51:52Z

Hi @ezrasilvera. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

elevran · 2025-11-26T12:59:21Z

/ok-to-test

nirrozenbaum · 2025-11-26T13:38:27Z

in which vllm version was it changed?

ezrasilvera · 2025-11-26T13:46:54Z

in which vllm version was it changed?

It seems like a relatively old change. You can see here vllm-project/vllm#18354 and vllm-project/vllm#24245
BTW, I think that because the metrics are just set to 0 if they don't exist and there is no failure, in practice it's very hard to know whether a scorer actually worked or not, as usually this will impact mainly the performance and not the decoding result.

nirrozenbaum · 2025-11-26T13:58:59Z

great catch @ezrasilvera.
what are your thoughts about adding some validation so we can catch this (at least in the logs) if it happens again?

/lgtm
/approve

k8s-ci-robot · 2025-11-26T13:59:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ezrasilvera, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [nirrozenbaum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

nirrozenbaum · 2025-11-26T13:59:44Z

cc @liu-cong @kfswain

liu-cong · 2025-11-26T17:23:16Z

Thank you @ezrasilvera for fixing this! Looks like this was changed in v0.10. And I confirmed that the latest llm-d image has the change.

I filed an issue in vllm to add conformance tests to prevent such breaking changes:
vllm-project/vllm#29508

liu-cong · 2025-11-26T17:27:04Z

@nirrozenbaum Any chance we cherrypick this to the release?

ezrasilvera · 2025-11-26T18:56:30Z

Thank you @ezrasilvera for fixing this! Looks like this was changed in v0.10. And I confirmed that the latest llm-d image has the change.

I filed an issue in vllm to add conformance tests to prevent such breaking changes: vllm-project/vllm#29508

@liu-cong @nirrozenbaum Unfortunately I don't think it's a vllm issue and I don't think tests on the vllm side can help avoiding such issues. They actually had grace period in which both metrics existed.
The main issue is that we don't have tests that validate the existing of what we consider as mandatory metrics. We also don't validate that the scorreres actually worked. For metrics we can think for example on adding an optional validation test that will be used only on major releases.

liu-cong · 2025-11-26T19:22:55Z

Unfortunately I don't think it's a vllm issue and I don't think tests on the vllm side can help avoiding such issues.

I think there are values in both adding tests to vllm side and here. Adding a conformance test on vllm side allows us to get notified early on breaking changes so we can plan proactively. It also enables a feedback channel for vllm maintainers to understand downstream dependencies better.

In EPP, some conformance test that covers latest N vllm versions is a great idea. I think we can add a "preflight" check feature as well (we should allow it to be turned off).

nirrozenbaum · 2025-11-26T20:31:06Z

@nirrozenbaum Any chance we cherrypick this to the release?

yes, theoretically we can, although this is configurable using a flag in helm install

ezrasilvera · 2025-11-26T20:33:27Z

Unfortunately I don't think it's a vllm issue and I don't think tests on the vllm side can help avoiding such issues.

I think there are values in both adding tests to vllm side and here. Adding a conformance test on vllm side allows us to get notified early on breaking changes so we can plan proactively. It also enables a feedback channel for vllm maintainers to understand downstream dependencies better.

In EPP, some conformance test that covers latest N vllm versions is a great idea. I think we can add a "preflight" check feature as well (we should allow it to be turned off).

I fully agree with both comments 😀

liu-cong · 2025-11-26T21:36:21Z

yes, theoretically we can, although this is configurable using a flag in helm install

@nirrozenbaum True, I am creating a PR to in llm-d to configure this correctly even without the patch release, just to be safe.

ezrasilvera · 2025-11-27T06:24:38Z

@nirrozenbaum Any chance we cherrypick this to the release?

yes, theoretically we can, although this is configurable using a flag in helm install

Indeed. This is exactly what we did to validate that the new name is working

…erc (kubernetes-sigs#1905) Signed-off-by: Ezra Silvera <ezra@il.ibm.com>

- IGW pr kubernetes-sigs/gateway-api-inference-extension#1905 landed fixing metrics name; no longer need to specify - tokenizer now pulls in vLLM and thus Torch: need writeable paths for torch on tokenization - tokenizer moves health endpoint from `/health` --> `/healthz` - image bumps Signed-off-by: greg pereira <grpereir@redhat.com>

Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_perc

1c93035

Signed-off-by: Ezra Silvera <ezra@il.ibm.com>

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Nov 26, 2025

k8s-ci-robot requested review from elevran and kfswain November 26, 2025 12:51

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 26, 2025

k8s-ci-robot assigned nirrozenbaum Nov 26, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 26, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2025

k8s-ci-robot merged commit cfdce59 into kubernetes-sigs:main Nov 26, 2025
12 checks passed

liu-cong mentioned this pull request Nov 26, 2025

Update the correct vllm kv cache utilization metric name in inference scheduler llm-d/llm-d#509

Merged

elevran pushed a commit to elevran/gateway-api-inference-extension that referenced this pull request Nov 27, 2025

Use the correct vllm metric gpu_cache_usage_perc --> kv_cache_usage_p…

8875a28

…erc (kubernetes-sigs#1905) Signed-off-by: Ezra Silvera <ezra@il.ibm.com>

Gregory-Pereira mentioned this pull request Apr 4, 2026

fix(llmisvc): upgrading to LLM-D v0.6.0 for llmisvc kserve/kserve#5346

Merged

Conversation

ezrasilvera commented Nov 26, 2025

Uh oh!

netlify bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

linux-foundation-easycla bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Nov 26, 2025

Uh oh!

k8s-ci-robot commented Nov 26, 2025

Uh oh!

elevran commented Nov 26, 2025

Uh oh!

nirrozenbaum commented Nov 26, 2025

Uh oh!

ezrasilvera commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirrozenbaum commented Nov 26, 2025

Uh oh!

k8s-ci-robot commented Nov 26, 2025

Uh oh!

nirrozenbaum commented Nov 26, 2025

Uh oh!

Uh oh!

liu-cong commented Nov 26, 2025

Uh oh!

liu-cong commented Nov 26, 2025

Uh oh!

ezrasilvera commented Nov 26, 2025

Uh oh!

liu-cong commented Nov 26, 2025

Uh oh!

nirrozenbaum commented Nov 26, 2025

Uh oh!

ezrasilvera commented Nov 26, 2025

Uh oh!

liu-cong commented Nov 26, 2025

Uh oh!

ezrasilvera commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

netlify bot commented Nov 26, 2025 •

edited

Loading

linux-foundation-easycla bot commented Nov 26, 2025 •

edited

Loading

ezrasilvera commented Nov 26, 2025 •

edited

Loading