fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable by BenjaminBraunDev · Pull Request #1929 · kubernetes-sigs/gateway-api-inference-extension

BenjaminBraunDev · 2025-12-02T00:11:07Z

Small bugfix for latency routing profile picker (it wasn't returning an empty profile map when sending a request with predictor routing header set to false and the profiles all run).

This PR both corrects that logic and also turns predictor based routing to on by default (when deploying with latency based routing flag in helm charts) and replaces the old flag with a flag that turns it off when included instead.

…redictor based scheduling off

netlify · 2025-12-02T00:11:13Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`ad7455a`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/69335154a2cec000082b81a3
😎 Deploy Preview	https://deploy-preview-1929--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

…nd set predictor scheduling to true instead of flase when no flag is present

shmuelk · 2025-12-03T15:48:47Z

pkg/epp/scheduling/framework/plugins/multi/slo_aware_router/headers.go

 	headerValue, ok := request.Headers[headerName]
 	if !ok {
-		return false, nil // Header not found, return 0 and false
+		return true, nil // Header not found, return true by default


If a Boolean header is not found, why should the default value be true?

Please revert this change.

@kfswain @ahg-g @kaushikmitr and I discussed this and since this is all behind the helm deployment flag to begin with, it makes the most sense for this header to specifically be used to disable the predictive routing, it should be on by default.

Updated header to a clear "off" flag and updated documentation.

shmuelk · 2025-12-03T16:07:09Z

In general, the header utility functions should be removed from both the profile modified in this PR and in SLO Aware plugin package. They should be added as functions on the LLMRequest struct.

BenjaminBraunDev · 2025-12-03T19:04:36Z

In general, the header utility functions should be removed from both the profile modified in this PR and in SLO Aware plugin package. They should be added as functions on the LLMRequest struct.

This was mainly done to keep all SLO routing change isolated to its associated plugins and modify as little non-plugin code as possible.

…d make it on by default when deploying with latency based routing

BenjaminBraunDev · 2025-12-05T23:11:47Z

@ahg-g Could you help get this PR approved? It's part of what's necessary for vertex to do AB testing with latency prediction.

@kaushikmitr has a PR out for getting approval for the plugin package, but this affects a few files outside the package.

ahg-g · 2025-12-05T23:35:51Z

/lgtm
/approve

k8s-ci-robot · 2025-12-05T23:35:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, BenjaminBraunDev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shmuelk · 2025-12-07T13:42:01Z

I don't understand why there is a header to disable the SLO based scheduling. Just don't use the plugin in the configuration.

Also the header name is strange. x-prediction-based-scheduling-off seems strange. It should have been x-disable-prediction-based-scheduling

ahg-g · 2025-12-08T23:00:55Z

We should remove the header when we graduate this plugin, for the most part it is to allow A/B testing and quickly turning it off if things are not working as expected.

shmuelk · 2025-12-09T07:00:30Z

We should remove the header when we graduate this plugin, for the most part it is to allow A/B testing and quickly turning it off if things are not working as expected.

A better way would be to add a profile handler that picks profiles based on a header. It would solve your A/B testing issue here as well as any other similar tests in a much more general fashion.

fix infinite loop in profile picker when using latency routing with p…

76e70ec

…redictor based scheduling off

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 2, 2025

k8s-ci-robot requested review from kfswain and nirrozenbaum December 2, 2025 00:11

BenjaminBraunDev changed the title ~~fix infinite loop in profile picker when using latency routing with p…~~ fix infinite loop in profile picker when using latency routing Dec 2, 2025

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 2, 2025

BenjaminBraunDev changed the title ~~fix infinite loop in profile picker when using latency routing~~ fix infinite loop in profile picker when using latency routing w/ predictor based routing off Dec 2, 2025

add fix in ProcessResults

bd98e42

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 2, 2025

BenjaminBraunDev added 2 commits December 2, 2025 00:56

Fix type for lint

9bdec62

Fix prefix cache not being being ordered properly in profile picker a…

a5b98db

…nd set predictor scheduling to true instead of flase when no flag is present

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 2, 2025

BenjaminBraunDev changed the title ~~fix infinite loop in profile picker when using latency routing w/ predictor based routing off~~ [latency-based-routing] fix infinite loop in profile picker when using latency routing w/ predictor based routing off Dec 2, 2025

BenjaminBraunDev changed the title ~~[latency-based-routing] fix infinite loop in profile picker when using latency routing w/ predictor based routing off~~ fix infinite loop in profile picker when using latency routing w/ predictor based routing off Dec 2, 2025

shmuelk suggested changes Dec 3, 2025

View reviewed changes

Change predictor based scheduling header to one that dissables it, an…

31902a6

…d make it on by default when deploying with latency based routing

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 3, 2025

BenjaminBraunDev changed the title ~~fix infinite loop in profile picker when using latency routing w/ predictor based routing off~~ fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable Dec 3, 2025

BenjaminBraunDev requested a review from shmuelk December 4, 2025 23:23

BenjaminBraunDev added 2 commits December 5, 2025 21:39

Move slo profile handler into slo routing package

d8443a8

Add slo aware handler

ad7455a

k8s-ci-robot assigned ahg-g Dec 5, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 5, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 5, 2025

k8s-ci-robot merged commit 78ffe61 into kubernetes-sigs:main Dec 5, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable#1929

fix infinite loop in profile picker and switch predictor based routing to on by default with a header to disable#1929
k8s-ci-robot merged 7 commits intokubernetes-sigs:mainfrom
BenjaminBraunDev:latency-routing-bugfix

BenjaminBraunDev commented Dec 2, 2025 •

edited

Loading

Uh oh!

netlify bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

shmuelk Dec 3, 2025

Uh oh!

BenjaminBraunDev Dec 3, 2025

Uh oh!

BenjaminBraunDev Dec 3, 2025

Uh oh!

shmuelk commented Dec 3, 2025

Uh oh!

BenjaminBraunDev commented Dec 3, 2025

Uh oh!

BenjaminBraunDev commented Dec 5, 2025

Uh oh!

ahg-g commented Dec 5, 2025

Uh oh!

k8s-ci-robot commented Dec 5, 2025

Uh oh!

Uh oh!

shmuelk commented Dec 7, 2025

Uh oh!

ahg-g commented Dec 8, 2025 •

edited

Loading

Uh oh!

shmuelk commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

BenjaminBraunDev commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

shmuelk Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

BenjaminBraunDev Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

BenjaminBraunDev Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

shmuelk commented Dec 3, 2025

Uh oh!

BenjaminBraunDev commented Dec 3, 2025

Uh oh!

BenjaminBraunDev commented Dec 5, 2025

Uh oh!

ahg-g commented Dec 5, 2025

Uh oh!

k8s-ci-robot commented Dec 5, 2025

Uh oh!

Uh oh!

shmuelk commented Dec 7, 2025

Uh oh!

ahg-g commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shmuelk commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

BenjaminBraunDev commented Dec 2, 2025 •

edited

Loading

netlify bot commented Dec 2, 2025 •

edited

Loading

ahg-g commented Dec 8, 2025 •

edited

Loading