Skip to content

Conversation

@pacoxu
Copy link
Contributor

@pacoxu pacoxu commented Apr 18, 2025

What this PR does / why we need it

Action Item:

  1. support envoy gateway and envoy ai gateway installation
  2. for playground change, update AIGatewayRoute/envoy-ai-gateway-basic and update AIServiceBackend/playgroundName.

Which issue(s) this PR fixes

Fixes #339

Special notes for your reviewer

Does this PR introduce a user-facing change?


@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. labels Apr 18, 2025
@InftyAI-Agent InftyAI-Agent requested a review from kerthcet April 18, 2025 10:42
func IsAIGatewayRouteExist(ctx context.Context, client client.Client) (bool, error) {
var route aigv1a1.AIGatewayRoute
err := client.Get(ctx, types.NamespacedName{
Name: "envoy-ai-gateway-basic",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we necessarily have to use hardcoding? in name namespace🤔

@kerthcet
Copy link
Member

In the future, we should combine the gateway configuration with Playground for easy start. However, right now, envoy ai gateway is still alpha, let's have the users to configure the envoy configurations themselves. What I mean here is we want't have any envoy configurations in our code base, what we need is just a Documentation about how to use the AI gateway. WDYT?

The consideration here is because envoy has a lot of configurations, like the weight, token limits, if we just hard code them, it doesn't make any sense, all the stuff should be exported one day, as I mentioned, with Playground, but not today.

@pacoxu pacoxu force-pushed the add-envoy-ai-gateway branch from 2a364a2 to 080aa64 Compare April 21, 2025 06:22
@kerthcet
Copy link
Member

List my example here, however, not work.

apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: qwen2-0--5b
spec:
  familyName: qwen2
  source:
    modelHub:
      modelID: Qwen/Qwen2-0.5B-Instruct-GGUF
      filename: qwen2-0_5b-instruct-q5_k_m.gguf
---
apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
  name: qwen2-0--5b
spec:
  replicas: 1
  modelClaim:
    modelName: qwen2-0--5b
  backendRuntimeConfig:
    backendName: llamacpp
    configName: default
    args:
      - -fa # use flash attention
---
apiVersion: llmaz.io/v1alpha1
kind: OpenModel
metadata:
  name: qwen2--5-coder
spec:
  familyName: qwen2
  source:
    modelHub:
      modelID: Qwen/Qwen2.5-Coder-0.5B-Instruct-GGUF
      filename: qwen2.5-coder-0.5b-instruct-q2_k.gguf
---
apiVersion: inference.llmaz.io/v1alpha1
kind: Playground
metadata:
  name: qwen2--5-coder
spec:
  replicas: 1
  modelClaim:
    modelName: qwen2--5-coder
  backendRuntimeConfig:
    backendName: llamacpp
    configName: default
    args:
      - -fa # use flash attention
---
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: default-envoy-ai-gateway
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: default-envoy-ai-gateway
  namespace: default
spec:
  gatewayClassName: default-envoy-ai-gateway
  listeners:
    - name: http
      protocol: HTTP
      port: 80
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: default-envoy-ai-gateway
  namespace: default
spec:
  schema:
    name: OpenAI
  targetRefs:
    - name: default-envoy-ai-gateway
      kind: Gateway
      group: gateway.networking.k8s.io
  rules:
    - matches:
        - headers:
            - type: Exact
              name: x-ai-eg-model
              value: qwen2-0--5b
      backendRefs:
        - name: qwen2-0--5b
    - matches:
        - headers:
            - type: Exact
              name: x-ai-eg-model
              value: qwen2--5-coder
      backendRefs:
        - name: qwen2--5-coder
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
  name: qwen2-0--5b
  namespace: default
spec:
  schema:
    name: OpenAI
  backendRef:
    name: qwen2-0--5b-lb
    kind: Service
    port: 8080
---
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIServiceBackend
metadata:
  name: qwen2--5-coder
  namespace: default
spec:
  schema:
    name: OpenAI
  backendRef:
    name: qwen2--5-coder-lb
    kind: Service
    port: 8080
---

@kerthcet
Copy link
Member

It works ... I don't know why.

@kerthcet
Copy link
Member

Once this is ready, I'll write a post to open-webui as kubernetes integration.

@kerthcet
Copy link
Member

also another adopter for envoy ai gateway.

@pacoxu pacoxu force-pushed the add-envoy-ai-gateway branch from 080aa64 to 75877f1 Compare April 21, 2025 10:57
@pacoxu pacoxu force-pushed the add-envoy-ai-gateway branch 2 times, most recently from e934511 to 872d02b Compare April 22, 2025 05:14
@pacoxu pacoxu changed the title [WIP]Add envoy ai gateway Add envoy ai gateway Apr 22, 2025
@pacoxu
Copy link
Contributor Author

pacoxu commented Apr 22, 2025

Almost ready for review:

  • I need to do a manual test, as I do not write an e2e test.

This may be a todo item.

sigs.k8s.io/json v0.0.0-20241014173422-cfa47c3a1cc8 // indirect
)

replace github.com/google/cel-go => github.com/google/cel-go v0.22.1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, there will be a compiler error, IIRC.

@pacoxu pacoxu force-pushed the add-envoy-ai-gateway branch 2 times, most recently from 516efca to 72a42eb Compare April 22, 2025 05:19
@pacoxu pacoxu marked this pull request as draft April 22, 2025 06:05
IMAGE_REPO := $(IMAGE_REGISTRY)/$(IMAGE_NAME)
GIT_TAG ?= $(shell git describe --tags --dirty --always)
GOPROXY=${GOPROXY:-""}
ifeq ($(origin GOPROXY), undefined)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please submit with another PR. It's irrelevant.

.PHONY: install
install: manifests kustomize ## Install CRDs into the K8s cluster specified in ~/.kube/config.
$(KUSTOMIZE) build config/crd | $(KUBECTL) apply -f -
$(KUSTOMIZE) build config/crd | $(KUBECTL) apply --server-side --force-conflicts -f -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will --force-conflicts cause other problems? Should be careful here, prefer to keep the server-side only.

@@ -0,0 +1,107 @@
{{- if .Values.envoyAIGateway.enabled -}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not include these files in the helm chart I think, let's make them an example instead. Because most of them should be user-defined.

// name: qwen2-0--5b-lb # model name
// kind: Service
// port: 8080
func CreateAIServiceBackend(ctx context.Context, client client.Client, backendRefName, namespace string, port int) error {
Copy link
Member

@kerthcet kerthcet Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can orchestrate the envoy ai gateway now, because we didn't expose any configurations from playground yet, for example, because the llm serving is usually very slow, we need to define the timeout in the AIServiceBackend, or most of the time, the request will be timeout.

So my suggestion is let's deploy these configurations manually, we'll provide an example for users to follow, only envoy-ai-gateway is mature, we'll add some fields to playground for quick integration, similar like we do here. So what we need is:

  • an example
  • helm dependence and enabled by default, if users want to disable the components, they should append the disable args after the install cmds.

@pacoxu pacoxu force-pushed the add-envoy-ai-gateway branch from 72a42eb to 1fcbcff Compare April 22, 2025 06:16
@pacoxu
Copy link
Contributor Author

pacoxu commented Apr 22, 2025

Use #360 instead.

This PR tried to create/update the ai gateway resource according to current playground. However, we may add some extra attributes for playground or new resource later to do that.

/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-kind Indicates a PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Envoy Gateway Initial

4 participants