Skip to content

Update InferencePool helm chart to use FailOpen as default#2365

Merged
k8s-ci-robot merged 10 commits intokubernetes-sigs:mainfrom
RyanRosario:helm
Feb 25, 2026
Merged

Update InferencePool helm chart to use FailOpen as default#2365
k8s-ci-robot merged 10 commits intokubernetes-sigs:mainfrom
RyanRosario:helm

Conversation

@RyanRosario
Copy link
Copy Markdown
Contributor

@RyanRosario RyanRosario commented Feb 17, 2026

What type of PR is this?
/kind cleanup
/kind documentation

What this PR does / why we need it:

Encourages use of FailOpen as the default failure mode in InferencePool in the helm chart

Which issue(s) this PR fixes:
Fixes #

Does this PR introduce a user-facing change?:

It sets the default failure mode to FailOpen in the helm chart and strongly encourages use of it as the default.

@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Feb 17, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 17, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit d0d5a5c
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/699e05a8724fe50008d39aa9
😎 Deploy Preview https://deploy-preview-2365--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 17, 2026
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 17, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @RyanRosario. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 17, 2026
@RyanRosario RyanRosario changed the title Helm [WIP] Update InferencePool helm chart to use FailOpen as default Feb 17, 2026
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 17, 2026
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 17, 2026
@RyanRosario RyanRosario changed the title [WIP] Update InferencePool helm chart to use FailOpen as default Update InferencePool helm chart to use FailOpen as default Feb 17, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 17, 2026
@RyanRosario
Copy link
Copy Markdown
Contributor Author

Tested with:

Test:

helm template test-pool config/charts/inferencepool --set inferencePool.modelServers.matchLabels.app=test helm template test-pool config/charts/inferencepool --set inferencePool.modelServers.matchLabels.app=test | grep failureMode -C 3

Proof that default is FailOpen:

helm template test-pool config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=test | grep failureMode -C 3

  endpointPickerRef:
    name: test-pool-epp
    port:
      number: 9002
    failureMode: FailOpen

Proof that users can still override:

helm template test-pool config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=test \
  --set inferenceExtension.failureMode=FailClose | grep failureMode -C 3


  endpointPickerRef:
    name: test-pool-epp
    port:
      number: 9002
    failureMode: FailClose

(also ran make vefify)

Comment thread site-src/api-types/inferencepool.md Outdated
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 19, 2026
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 19, 2026
@RyanRosario
Copy link
Copy Markdown
Contributor Author

@kfswain Ready for review

Comment thread api/v1/inferencepool_types.go Outdated
Comment thread config/charts/inferencepool/templates/inferencepool.yaml
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 20, 2026
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 20, 2026
@RyanRosario RyanRosario force-pushed the helm branch 2 times, most recently from fb1a49c to 27f152a Compare February 22, 2026 23:18
@RyanRosario
Copy link
Copy Markdown
Contributor Author

Below is the verification of FailOpen now returning 200 whereas FailClose returns 503 when EPP is broken.

There were a few challenges here (order of operations):

  1. Ordering conditions due to delays in the load balancer provisioning.
  2. helm upgrade kept re-rendering the EPP deployment back to a healthy condition.

These are the steps I used to verify:

Setup

IGW_LATEST_RELEASE=$(curl -s https://api.github.com/repos/kubernetes-sigs/gateway-api-inference-extension/releases | jq -r '.[] | select(.prerelease == false) | .tag_name' | sort -V | tail -n1)
export IGW_CHART_VERSION=$IGW_LATEST_RELEASE
export GATEWAY_PROVIDER=gke
echo "Using IGW release: $IGW_LATEST_RELEASE"

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/${IGW_LATEST_RELEASE}/config/manifests/vllm/sim-deployment.yaml

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/${IGW_LATEST_RELEASE}/manifests.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/${IGW_LATEST_RELEASE}/config/manifests/gateway/gke/gateway.yaml

Additional Experimental Setup

./toggle_failure_mode.sh Close

helm template vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION | grep -i fail

helm install vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION

(Apply HealthCheckPolicy so that failures aren't just because of missing health checks)
cat <<EOF | kubectl apply -f -
apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
  name: inference-healthcheck
spec:
  default:
    config:
      httpHealthCheck:
        portSpecification: USE_SERVING_PORT
        requestPath: /v1/models
      type: HTTP
  targetRef:
    group: inference.networking.k8s.io
    kind: InferencePool
    name: vllm-qwen3-32b
EOF

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/${IGW_LATEST_RELEASE}/config/manifests/inferenceobjective.yaml


IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=80
echo "Gateway: ${IP}:${PORT}"

Good EPP, FailClose (already set previously and old default)

**Expectation: ** 200 OK

./toggle_failure_mode.sh Close

helm template vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION | grep -i fail

kubectl get inferencepool vllm-qwen3-32b -o yaml | grep -i fail

curl -i ${IP}:${PORT}/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "food-review-1",
    "prompt": "Write as if you were a critic: San Francisco",
    "max_tokens": 100,
    "temperature": 0
  }'

Received HTTP/1.1 200 OK eventually

Good EPP, FailOpen (new default)

**Expectation: ** 200 OK

./toggle_failure_mode.sh Open

helm template vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION | grep -i fail

helm upgrade vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION

kubectl get inferencepool vllm-qwen3-32b -o yaml | grep -i fail

curl -i ${IP}:${PORT}/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "food-review-1",
    "prompt": "Write as if you were a critic: San Francisco",
    "max_tokens": 100,
    "temperature": 0
  }'

Received HTTP/1.1 200 OK.

Break the EPP

kubectl set image deploy/vllm-qwen3-32b-epp epp=ghcr.io/does-not-exist/fake-epp:v0.0.0

Bad EPP, Fail Open

Expectation: Something other than 500/503

./toggle_failure_mode.sh Open

curl -i ${IP}:${PORT}/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "food-review-1",
    "prompt": "Write as if you were a critic: San Francisco",
    "max_tokens": 100,
    "temperature": 0
  }'

Received HTTP/1.1 200 OK.

Bad EPP, FailClose (and break EPP again)

Expectation: Failure (500 or 503)

./toggle_failure_mode.sh Close

helm template vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION | grep -i fail

helm upgrade vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION

kubectl get inferencepool vllm-qwen3-32b -o yaml | grep -i fail

kubectl set image deploy/vllm-qwen3-32b-epp epp=ghcr.io/does-not-exist/fake-epp:v0.0.0

curl -i ${IP}:${PORT}/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "food-review-1",
    "prompt": "Write as if you were a critic: San Francisco",
    "max_tokens": 100,
    "temperature": 0
  }'

Received HTTP/1.1 503 Service Unavailable.

Ancillary Test

I was concerned that these tests do not load the defaults that I specified; however, the only changes I made to the files was in the Helm chart itself and each time Helm picked up the new FailureMode.

Ancillary Script

Purpose: Quickly switch between FailOpen and FailClose

#!/bin/bash
set -e

# Check argument
if [ -z "$1" ]; then
  echo "Usage: $0 [Open|Close]"
  exit 1
fi

MODE=$(echo "$1" | tr '[:lower:]' '[:upper:]')

if [ "$MODE" == "OPEN" ]; then
    TARGET="FailOpen"
    SOURCE="FailClose"
elif [ "$MODE" == "CLOSE" ]; then
    TARGET="FailClose"
    SOURCE="FailOpen"
else
    echo "Invalid argument. Use 'Open' or 'Close'."
    exit 1
fi

VALUES_FILE="config/charts/inferencepool/values.yaml"
TEMPLATE_FILE="config/charts/inferencepool/templates/inferencepool.yaml"

# Replace SOURCE with TARGET in the files
# This ensures that if we want Open, any FailClose becomes FailOpen.
# If we want Close, any FailOpen becomes FailClose.

echo "Setting global failureMode to $TARGET..."

sed -i "s/$SOURCE/$TARGET/g" "$VALUES_FILE"
sed -i "s/$SOURCE/$TARGET/g" "$TEMPLATE_FILE"

echo "Updated $VALUES_FILE and $TEMPLATE_FILE to use $TARGET"

Additionally, the timeouts in httproute.yaml needs to be commented out as apparently it's experimental. I chose not to commit this as it may break things and it was only used for this experiment.

timeouts:
    request: 300s

Comment thread conformance/go.mod Outdated
Comment thread conformance/go.sum Outdated
Comment thread site-src/api-types/inferencepool.md Outdated
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 24, 2026
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 24, 2026
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 24, 2026
@RyanRosario RyanRosario requested review from ahg-g and kfswain February 25, 2026 18:39
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Feb 25, 2026

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 25, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, RyanRosario

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 25, 2026
@k8s-ci-robot k8s-ci-robot merged commit bdf7862 into kubernetes-sigs:main Feb 25, 2026
11 checks passed
RyanRosario added a commit to RyanRosario/gateway-api-inference-extension that referenced this pull request Mar 9, 2026
…s-sigs#2365)

* Update InferencePool helm chart default to FailOpen

* Revert InferencePool API default to FailClose, keep Helm default FailOpen

* Update helm chart

* Allow helm chart template to pass FailOpen configuration to it

* Modify failureMode in values.yaml

* Remove v1 inference types and CRDs from PR

* Modify model name in documentation

* Restore deleted line

* Reverting change to documentation as requested

* Revert inferencepool.md documentation.

---------

Co-authored-by: Ryan Rosario <6713180+RyanRosario@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants