Add envoy ai gateway #353

pacoxu · 2025-04-18T10:41:54Z

What this PR does / why we need it

Action Item:

support envoy gateway and envoy ai gateway installation
for playground change, update AIGatewayRoute/envoy-ai-gateway-basic and update AIServiceBackend/playgroundName.

Which issue(s) this PR fixes

Fixes #339

Special notes for your reviewer

Does this PR introduce a user-facing change?

googs1025 · 2025-04-19T01:18:24Z

pkg/controller/inference/gateway.go

+func IsAIGatewayRouteExist(ctx context.Context, client client.Client) (bool, error) {
+	var route aigv1a1.AIGatewayRoute
+	err := client.Get(ctx, types.NamespacedName{
+		Name:      "envoy-ai-gateway-basic",


Do we necessarily have to use hardcoding? in name namespace🤔

kerthcet · 2025-04-19T03:51:26Z

In the future, we should combine the gateway configuration with Playground for easy start. However, right now, envoy ai gateway is still alpha, let's have the users to configure the envoy configurations themselves. What I mean here is we want't have any envoy configurations in our code base, what we need is just a Documentation about how to use the AI gateway. WDYT?

The consideration here is because envoy has a lot of configurations, like the weight, token limits, if we just hard code them, it doesn't make any sense, all the stuff should be exported one day, as I mentioned, with Playground, but not today.

kerthcet · 2025-04-21T07:12:31Z

List my example here, however, not work.

apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: qwen2-0--5b
spec:
  familyName: qwen2
  source:
    modelHub:
      modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
      filename: qwen2-0_5b-instruct-q5_k_m.gguf
---
apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
  name: qwen2-0--5b
spec:
  replicas: 1
  modelClaim:
    modelName: qwen2-0--5b
  backendRuntimeConfig:
    backendName: llamacpp
    configName: default
    args:
      - -fa # use flash attention
---
apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: qwen2--5-coder
spec:
  familyName: qwen2
  source:
    modelHub:
      modelID: Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF
      filename: qwen2.5-coder-0.5b-instruct-q2_k.gguf
---
apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
  name: qwen2--5-coder
spec:
  replicas: 1
  modelClaim:
    modelName: qwen2--5-coder
  backendRuntimeConfig:
    backendName: llamacpp
    configName: default
    args:
      - -fa # use flash attention
---
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: default-envoy-ai-gateway
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: default-envoy-ai-gateway
  namespace: default
spec:
  gatewayClassName: default-envoy-ai-gateway
  listeners:
    - name: http
      protocol: HTTP
      port: 80
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: default-envoy-ai-gateway
  namespace: default
spec:
  schema:
    name: OpenAI
  targetRefs:
    - name: default-envoy-ai-gateway
      kind: Gateway
      group: gateway.networking.k8s.io
  rules:
    - matches:
        - headers:
            - type: Exact
              name: x-ai-eg-model
              value: qwen2-0--5b
      backendRefs:
        - name: qwen2-0--5b
    - matches:
        - headers:
            - type: Exact
              name: x-ai-eg-model
              value: qwen2--5-coder
      backendRefs:
        - name: qwen2--5-coder
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
  name: qwen2-0--5b
  namespace: default
spec:
  schema:
    name: OpenAI
  backendRef:
    name: qwen2-0--5b-lb
    kind: Service
    port: 8080
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
  name: qwen2--5-coder
  namespace: default
spec:
  schema:
    name: OpenAI
  backendRef:
    name: qwen2--5-coder-lb
    kind: Service
    port: 8080
---

kerthcet · 2025-04-21T07:21:13Z

It works ... I don't know why.

kerthcet · 2025-04-21T10:02:55Z

Once this is ready, I'll write a post to open-webui as kubernetes integration.

kerthcet · 2025-04-21T10:07:39Z

also another adopter for envoy ai gateway.

…y ai gateway basic quick start

pacoxu · 2025-04-22T05:15:41Z

Almost ready for review:

I need to do a manual test, as I do not write an e2e test.

This may be a todo item.

pacoxu · 2025-04-22T05:18:56Z

go.mod

+	sigs.k8s.io/json v0.0.0-20241014173422-cfa47c3a1cc8 // indirect
 )
+
+replace github.com/google/cel-go => github.com/google/cel-go v0.22.1


Without this, there will be a compiler error, IIRC.

kerthcet · 2025-04-22T06:02:08Z

Makefile

 IMAGE_REPO := $(IMAGE_REGISTRY)/$(IMAGE_NAME)
 GIT_TAG ?= $(shell git describe --tags --dirty --always)
 GOPROXY=${GOPROXY:-""}
+ifeq ($(origin GOPROXY), undefined)


Please submit with another PR. It's irrelevant.

kerthcet · 2025-04-22T06:02:59Z

Makefile

 .PHONY: install
 install: manifests kustomize ## Install CRDs into the K8s cluster specified in ~/.kube/config.
-	$(KUSTOMIZE) build config/crd | $(KUBECTL) apply -f -
+	$(KUSTOMIZE) build config/crd | $(KUBECTL) apply --server-side --force-conflicts -f -


Will --force-conflicts cause other problems? Should be careful here, prefer to keep the server-side only.

kerthcet · 2025-04-22T06:05:12Z

chart/templates/gateway/envoy-ai-gateway.yaml

@@ -0,0 +1,107 @@
+{{- if .Values.envoyAIGateway.enabled -}}


Let's not include these files in the helm chart I think, let's make them an example instead. Because most of them should be user-defined.

kerthcet · 2025-04-22T06:10:55Z

pkg/controller/inference/gateway.go

+//     name: qwen2-0--5b-lb # model name
+//     kind: Service
+//     port: 8080
+func CreateAIServiceBackend(ctx context.Context, client client.Client, backendRefName, namespace string, port int) error {


I don't think we can orchestrate the envoy ai gateway now, because we didn't expose any configurations from playground yet, for example, because the llm serving is usually very slow, we need to define the timeout in the AIServiceBackend, or most of the time, the request will be timeout.

So my suggestion is let's deploy these configurations manually, we'll provide an example for users to follow, only envoy-ai-gateway is mature, we'll add some fields to playground for quick integration, similar like we do here. So what we need is:

an example

helm dependence and enabled by default, if users want to disable the components, they should append the disable args after the install cmds.

pacoxu · 2025-04-22T07:20:34Z

Use #360 instead.

This PR tried to create/update the ai gateway resource according to current playground. However, we may add some extra attributes for playground or new resource later to do that.

/close

InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Apr 18, 2025

InftyAI-Agent requested a review from kerthcet April 18, 2025 10:42

googs1025 reviewed Apr 19, 2025

View reviewed changes

pacoxu force-pushed the add-envoy-ai-gateway branch from 2a364a2 to 080aa64 Compare April 21, 2025 06:22

kerthcet mentioned this pull request Apr 21, 2025

Add open-webui as the default chatbot #357

Merged

pacoxu force-pushed the add-envoy-ai-gateway branch from 080aa64 to 75877f1 Compare April 21, 2025 10:57

pacoxu added 5 commits April 21, 2025 19:09

add envoy gateway helm dependency and envoy gateway quickstart & envo…

bbb79c0

…y ai gateway basic quick start

add install option for ai gateway and envoy gateway

c56150a

add basic update logic for ai gateway

126e74c

add github.com/envoyproxy/ai-gateway v0.1.5

8e60d69

local fix

775fccb

pacoxu force-pushed the add-envoy-ai-gateway branch 2 times, most recently from e934511 to 872d02b Compare April 22, 2025 05:14

pacoxu changed the title ~~[WIP]Add envoy ai gateway~~ Add envoy ai gateway Apr 22, 2025

pacoxu commented Apr 22, 2025

View reviewed changes

pacoxu force-pushed the add-envoy-ai-gateway branch 2 times, most recently from 516efca to 72a42eb Compare April 22, 2025 05:19

pacoxu marked this pull request as draft April 22, 2025 06:05

kerthcet reviewed Apr 22, 2025

View reviewed changes

fix makefilke

1fcbcff

pacoxu force-pushed the add-envoy-ai-gateway branch from 72a42eb to 1fcbcff Compare April 22, 2025 06:16

temp commit

6e969f9

InftyAI-Agent closed this Apr 22, 2025

Uh oh!

Add envoy ai gateway #353

Add envoy ai gateway #353

Uh oh!

Conversation

pacoxu commented Apr 18, 2025

What this PR does / why we need it

Which issue(s) this PR fixes

Special notes for your reviewer

Does this PR introduce a user-facing change?

Uh oh!

googs1025 Apr 19, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet commented Apr 19, 2025

Uh oh!

kerthcet commented Apr 21, 2025

Uh oh!

kerthcet commented Apr 21, 2025

Uh oh!

kerthcet commented Apr 21, 2025

Uh oh!

kerthcet commented Apr 21, 2025

Uh oh!

pacoxu commented Apr 22, 2025

Uh oh!

pacoxu Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

kerthcet Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pacoxu commented Apr 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kerthcet Apr 22, 2025 •

edited

Loading