Update InferencePool helm chart to use FailOpen as default by RyanRosario · Pull Request #2365 · kubernetes-sigs/gateway-api-inference-extension

RyanRosario · 2026-02-17T23:33:25Z

What type of PR is this?
/kind cleanup
/kind documentation

What this PR does / why we need it:

Encourages use of FailOpen as the default failure mode in InferencePool in the helm chart

Which issue(s) this PR fixes:
Fixes #

Does this PR introduce a user-facing change?:

It sets the default failure mode to FailOpen in the helm chart and strongly encourages use of it as the default.

netlify · 2026-02-17T23:33:31Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`d0d5a5c`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/699e05a8724fe50008d39aa9
😎 Deploy Preview	https://deploy-preview-2365--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2026-02-17T23:33:35Z

Hi @RyanRosario. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

RyanRosario · 2026-02-17T23:39:32Z

Tested with:

Test:

helm template test-pool config/charts/inferencepool --set inferencePool.modelServers.matchLabels.app=test helm template test-pool config/charts/inferencepool --set inferencePool.modelServers.matchLabels.app=test | grep failureMode -C 3

Proof that default is FailOpen:

helm template test-pool config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=test | grep failureMode -C 3

  endpointPickerRef:
    name: test-pool-epp
    port:
      number: 9002
    failureMode: FailOpen

Proof that users can still override:

helm template test-pool config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=test \
  --set inferenceExtension.failureMode=FailClose | grep failureMode -C 3


  endpointPickerRef:
    name: test-pool-epp
    port:
      number: 9002
    failureMode: FailClose

(also ran make vefify)

RyanRosario · 2026-02-19T19:50:52Z

@kfswain Ready for review

RyanRosario · 2026-02-24T08:00:47Z

Below is the verification of FailOpen now returning 200 whereas FailClose returns 503 when EPP is broken.

There were a few challenges here (order of operations):

Ordering conditions due to delays in the load balancer provisioning.
helm upgrade kept re-rendering the EPP deployment back to a healthy condition.

These are the steps I used to verify:

Setup

IGW_LATEST_RELEASE=$(curl -s https://api.github.com/repos/kubernetes-sigs/gateway-api-inference-extension/releases | jq -r '.[] | select(.prerelease == false) | .tag_name' | sort -V | tail -n1)
export IGW_CHART_VERSION=$IGW_LATEST_RELEASE
export GATEWAY_PROVIDER=gke
echo "Using IGW release: $IGW_LATEST_RELEASE"

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/${IGW_LATEST_RELEASE}/config/manifests/vllm/sim-deployment.yaml

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/${IGW_LATEST_RELEASE}/manifests.yaml

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/${IGW_LATEST_RELEASE}/config/manifests/gateway/gke/gateway.yaml

Additional Experimental Setup

./toggle_failure_mode.sh Close

helm template vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION | grep -i fail

helm install vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION

(Apply HealthCheckPolicy so that failures aren't just because of missing health checks)
cat <<EOF | kubectl apply -f -
apiVersion: networking.gke.io/v1
kind: HealthCheckPolicy
metadata:
  name: inference-healthcheck
spec:
  default:
    config:
      httpHealthCheck:
        portSpecification: USE_SERVING_PORT
        requestPath: /v1/models
      type: HTTP
  targetRef:
    group: inference.networking.k8s.io
    kind: InferencePool
    name: vllm-qwen3-32b
EOF

kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/${IGW_LATEST_RELEASE}/config/manifests/inferenceobjective.yaml


IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=80
echo "Gateway: ${IP}:${PORT}"

Good EPP, FailClose (already set previously and old default)

**Expectation: ** 200 OK

./toggle_failure_mode.sh Close

helm template vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION | grep -i fail

kubectl get inferencepool vllm-qwen3-32b -o yaml | grep -i fail

curl -i ${IP}:${PORT}/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "food-review-1",
    "prompt": "Write as if you were a critic: San Francisco",
    "max_tokens": 100,
    "temperature": 0
  }'

Received HTTP/1.1 200 OK eventually

Good EPP, FailOpen (new default)

**Expectation: ** 200 OK

./toggle_failure_mode.sh Open

helm template vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION | grep -i fail

helm upgrade vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION

kubectl get inferencepool vllm-qwen3-32b -o yaml | grep -i fail

curl -i ${IP}:${PORT}/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "food-review-1",
    "prompt": "Write as if you were a critic: San Francisco",
    "max_tokens": 100,
    "temperature": 0
  }'

Received HTTP/1.1 200 OK.

Break the EPP

kubectl set image deploy/vllm-qwen3-32b-epp epp=ghcr.io/does-not-exist/fake-epp:v0.0.0

Bad EPP, Fail Open

Expectation: Something other than 500/503

./toggle_failure_mode.sh Open

curl -i ${IP}:${PORT}/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "food-review-1",
    "prompt": "Write as if you were a critic: San Francisco",
    "max_tokens": 100,
    "temperature": 0
  }'

Received HTTP/1.1 200 OK.

Bad EPP, FailClose (and break EPP again)

Expectation: Failure (500 or 503)

./toggle_failure_mode.sh Close

helm template vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION | grep -i fail

helm upgrade vllm-qwen3-32b \
  config/charts/inferencepool \
  --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
  --set provider.name=$GATEWAY_PROVIDER \
  --set experimentalHttpRoute.enabled=true \
  --version $IGW_CHART_VERSION

kubectl get inferencepool vllm-qwen3-32b -o yaml | grep -i fail

kubectl set image deploy/vllm-qwen3-32b-epp epp=ghcr.io/does-not-exist/fake-epp:v0.0.0

curl -i ${IP}:${PORT}/v1/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "food-review-1",
    "prompt": "Write as if you were a critic: San Francisco",
    "max_tokens": 100,
    "temperature": 0
  }'

Received HTTP/1.1 503 Service Unavailable.

Ancillary Test

I was concerned that these tests do not load the defaults that I specified; however, the only changes I made to the files was in the Helm chart itself and each time Helm picked up the new FailureMode.

Ancillary Script

Purpose: Quickly switch between FailOpen and FailClose

#!/bin/bash
set -e

# Check argument
if [ -z "$1" ]; then
  echo "Usage: $0 [Open|Close]"
  exit 1
fi

MODE=$(echo "$1" | tr '[:lower:]' '[:upper:]')

if [ "$MODE" == "OPEN" ]; then
    TARGET="FailOpen"
    SOURCE="FailClose"
elif [ "$MODE" == "CLOSE" ]; then
    TARGET="FailClose"
    SOURCE="FailOpen"
else
    echo "Invalid argument. Use 'Open' or 'Close'."
    exit 1
fi

VALUES_FILE="config/charts/inferencepool/values.yaml"
TEMPLATE_FILE="config/charts/inferencepool/templates/inferencepool.yaml"

# Replace SOURCE with TARGET in the files
# This ensures that if we want Open, any FailClose becomes FailOpen.
# If we want Close, any FailOpen becomes FailClose.

echo "Setting global failureMode to $TARGET..."

sed -i "s/$SOURCE/$TARGET/g" "$VALUES_FILE"
sed -i "s/$SOURCE/$TARGET/g" "$TEMPLATE_FILE"

echo "Updated $VALUES_FILE and $TEMPLATE_FILE to use $TARGET"

Additionally, the timeouts in httproute.yaml needs to be commented out as apparently it's experimental. I chose not to commit this as it may break things and it was only used for this experiment.

timeouts:
    request: 300s

…Open

ahg-g · 2026-02-25T18:41:23Z

/lgtm
/approve

k8s-ci-robot · 2026-02-25T18:41:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, RyanRosario

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…s-sigs#2365) * Update InferencePool helm chart default to FailOpen * Revert InferencePool API default to FailClose, keep Helm default FailOpen * Update helm chart * Allow helm chart template to pass FailOpen configuration to it * Modify failureMode in values.yaml * Remove v1 inference types and CRDs from PR * Modify model name in documentation * Restore deleted line * Reverting change to documentation as requested * Revert inferencepool.md documentation. --------- Co-authored-by: Ryan Rosario <6713180+RyanRosario@users.noreply.github.com>

k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Feb 17, 2026

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 17, 2026

k8s-ci-robot requested review from kfswain and liu-cong February 17, 2026 23:33

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 17, 2026

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 17, 2026

RyanRosario changed the title ~~Helm~~ [WIP] Update InferencePool helm chart to use FailOpen as default Feb 17, 2026

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 17, 2026

RyanRosario force-pushed the helm branch from 600eec6 to 37ba569 Compare February 17, 2026 23:36

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 17, 2026

RyanRosario changed the title ~~[WIP] Update InferencePool helm chart to use FailOpen as default~~ Update InferencePool helm chart to use FailOpen as default Feb 17, 2026

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 17, 2026

kfswain reviewed Feb 18, 2026

View reviewed changes

Comment thread site-src/api-types/inferencepool.md Outdated

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 19, 2026

RyanRosario force-pushed the helm branch from 4e6cee5 to abebd3e Compare February 19, 2026 19:46

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 19, 2026

kfswain reviewed Feb 19, 2026

View reviewed changes

Comment thread api/v1/inferencepool_types.go Outdated

Comment thread config/charts/inferencepool/templates/inferencepool.yaml

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 20, 2026

RyanRosario force-pushed the helm branch from abebd3e to 004c353 Compare February 20, 2026 20:19

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 20, 2026

RyanRosario force-pushed the helm branch from 004c353 to 9a61fc4 Compare February 22, 2026 04:22

Update InferencePool helm chart default to FailOpen

1dce665

RyanRosario force-pushed the helm branch 2 times, most recently from fb1a49c to 27f152a Compare February 22, 2026 23:18

ahg-g reviewed Feb 24, 2026

View reviewed changes

Comment thread conformance/go.mod Outdated

Comment thread conformance/go.sum Outdated

Comment thread site-src/api-types/inferencepool.md Outdated

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 24, 2026

RyanRosario added 4 commits February 24, 2026 19:22

Revert InferencePool API default to FailClose, keep Helm default Fail…

f30b7db

…Open

Update helm chart

cc56925

Allow helm chart template to pass FailOpen configuration to it

598bbdd

Modify failureMode in values.yaml

663a30e

RyanRosario force-pushed the helm branch from 6b929b6 to 663a30e Compare February 24, 2026 19:28

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 24, 2026

RyanRosario added 2 commits February 24, 2026 19:44

Remove v1 inference types and CRDs from PR

6246e99

Modify model name in documentation

3db0512

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 24, 2026

RyanRosario added 3 commits February 24, 2026 19:56

Restore deleted line

a30eaa9

Reverting change to documentation as requested

ac49b15

Revert inferencepool.md documentation.

d0d5a5c

RyanRosario requested review from ahg-g and kfswain February 25, 2026 18:39

k8s-ci-robot assigned ahg-g Feb 25, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 25, 2026

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 25, 2026

k8s-ci-robot merged commit bdf7862 into kubernetes-sigs:main Feb 25, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update InferencePool helm chart to use FailOpen as default#2365

Update InferencePool helm chart to use FailOpen as default#2365
k8s-ci-robot merged 10 commits intokubernetes-sigs:mainfrom
RyanRosario:helm

RyanRosario commented Feb 17, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Feb 17, 2026 •

edited

Loading

Uh oh!

k8s-ci-robot commented Feb 17, 2026

Uh oh!

RyanRosario commented Feb 17, 2026

Uh oh!

Uh oh!

RyanRosario commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

RyanRosario commented Feb 24, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahg-g commented Feb 25, 2026

Uh oh!

k8s-ci-robot commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

RyanRosario commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Feb 17, 2026

Uh oh!

RyanRosario commented Feb 17, 2026

Uh oh!

Uh oh!

RyanRosario commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

RyanRosario commented Feb 24, 2026

Setup

Additional Experimental Setup

Good EPP, FailClose (already set previously and old default)

Good EPP, FailOpen (new default)

Break the EPP

Bad EPP, Fail Open

Bad EPP, FailClose (and break EPP again)

Ancillary Test

Ancillary Script

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahg-g commented Feb 25, 2026

Uh oh!

k8s-ci-robot commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RyanRosario commented Feb 17, 2026 •

edited

Loading

netlify Bot commented Feb 17, 2026 •

edited

Loading