update bbr quickstart guide with latest functionality#2150
update bbr quickstart guide with latest functionality#2150k8s-ci-robot merged 2 commits intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
/hold self verifying before opening for review. |
|
/unhold |
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
f56840c to
41cf4ab
Compare
|
cc: @howardjohn please pay attention that continuing our conversation on #1908 kgateway was temporarily removed, since it doesn't support bbr's new functionality. it would be great if we can add kgateway back with support for the new functionality. |
howardjohn
left a comment
There was a problem hiding this comment.
Please don't remove our implementation. We can support the native and ext-proc approach, so there is no need for removal -- maybe tweaks.
I don't see any new functionality that cannot be done natively, IIUC. I believe the idea is there is now a mapping of adapter to model, instead of a simple body --> header.
This can be trivially done by the proxy as well with a minor tweak:
apiVersion: gateway.kgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
name: bbr
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
traffic:
phase: PreRouting
transformation:
request:
set:
- name: X-Gateway-Model-Name
value: |
{
"food-review-1": "meta-llama/Llama-3.1-8B-Instruct",
"meta-llama/Llama-3.1-8B-Instruct": "meta-llama/Llama-3.1-8B-Instruct",
}[json(request.body).model]
sounds good. |
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
|
@howardjohn updated agentgatewaypolicy as you suggested. /unhold |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: howardjohn, nirrozenbaum The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
…s#2150) * update bbr quickstart guide with latest functionality Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * add back kgateway bbr based on agentgateway policy Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
What type of PR is this?
/kind documentation
What this PR does / why we need it:
update serving multiple inference pool quickstart guide when using the code from main.
Which issue(s) this PR fixes:
This PR completes the work on multi inference pool management through bbr.
Fixes #1812
Does this PR introduce a user-facing change?: