Increase default FlowGCTimeout to 1h to prevent premature GC by LukeAVanDrie · Pull Request #2143 · kubernetes-sigs/gateway-api-inference-extension

LukeAVanDrie · 2026-01-13T20:43:21Z

What type of PR is this?
/kind bug

What this PR does / why we need it:

This PR increases the default FlowGCTimeout to 1 hour.

Context

Currently, the Flow Registry's garbage collector relies on a leaseCount that tracks the distribution phase but not the queueing phase. If a request sits in the queue waiting for a backend (e.g., waiting for a Pod to spin up) longer than the configured GC timeout (and no new traffic for the flow arrives during this time), the Registry mistakenly identifies the flow as "Idle" and deletes the queue resources, causing the request to be orphaned. This is difficult to trigger under normal load, but it is relevant for Scale from Zero.

The Fix

By increasing the default timeout to 1h, we ensure that the GC timeout is strictly larger than any realistic queueing duration (which will likely hit client timeouts or other limits first). This makes the race condition unreachable in practice without requiring complex architectural changes in the release candidate.

A full architectural fix (switching to optimistic concurrency and lifecycle-aware leasing) is targeted for the next release cycle.

Which issue(s) this PR fixes:
Hack for #1982

Does this PR introduce a user-facing change?:

Increased the default Flow Control garbage collection timeout to 1 hour. This prevents the accidental deletion of active flows during long queueing periods, improving stability during scale-from-zero.

This commit increases the default Flow Garbage Collection timeout from from 5 minutes to 1 hour. This serves as a mitigation for a race condition where requests pending in the queue for longer than the GC timeout (e.g., during scale-from-zero) could result in the underlying flow state being deleted while the request was still active.

netlify · 2026-01-13T20:43:27Z

👷 Deploy Preview for gateway-api-inference-extension processing.

Name	Link
🔨 Latest commit	`2c03ee7`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6966ae6c875eb200084300d2

k8s-ci-robot · 2026-01-13T20:43:31Z

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

LukeAVanDrie · 2026-01-13T20:44:13Z

/cc @aishukamal and @kfswain

k8s-ci-robot · 2026-01-13T20:44:16Z

@LukeAVanDrie: GitHub didn't allow me to request PR reviews from the following users: aishukamal, and.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @aishukamal and @kfswain

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

LukeAVanDrie · 2026-01-13T20:59:49Z

Actual fix for this is #2127 and #2131. These just seem risky to cherry-pick.

aishukamal · 2026-01-14T00:26:24Z

/lgtm

(I don't think I'm allowed to lgtm PRs yet, but the change looks good to me)

k8s-ci-robot · 2026-01-14T00:26:28Z

@aishukamal: changing LGTM is restricted to collaborators

Details

In response to this:

/lgtm

(I don't think I'm allowed to lgtm PRs yet, but the change looks good to me)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ahg-g · 2026-01-14T03:58:30Z

/ok-to-test

ahg-g · 2026-01-14T03:59:15Z

/lgtm
/approve

k8s-ci-robot · 2026-01-14T03:59:24Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

LukeAVanDrie · 2026-01-14T22:43:59Z

CI/CD seems to be hanging -- /retest

kfswain · 2026-01-14T23:14:22Z

/restest

kfswain · 2026-01-14T23:15:14Z

/cherrypick release-1.3

k8s-infra-cherrypick-robot · 2026-01-14T23:15:18Z

@kfswain: once the present PR merges, I will cherry-pick it on top of release-1.3 in a new PR and assign it to you.

Details

In response to this:

/cherrypick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kfswain · 2026-01-14T23:15:59Z

I'm going to force merge this since this netlify job is hanging and this PR is not related to our docs:

k8s-infra-cherrypick-robot · 2026-01-14T23:16:43Z

@kfswain: new pull request created: #2154

Details

In response to this:

/cherrypick release-1.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

This commit increases the default Flow Garbage Collection timeout from from 5 minutes to 1 hour. This serves as a mitigation for a race condition where requests pending in the queue for longer than the GC timeout (e.g., during scale-from-zero) could result in the underlying flow state being deleted while the request was still active.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 13, 2026

k8s-ci-robot requested review from danehans and kfswain January 13, 2026 20:43

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 13, 2026

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 13, 2026

LukeAVanDrie changed the title ~~epp: increase default FlowGCTimeout to 1h to prevent premature GC~~ increase default FlowGCTimeout to 1h to prevent premature GC Jan 13, 2026

LukeAVanDrie changed the title ~~increase default FlowGCTimeout to 1h to prevent premature GC~~ Increase default FlowGCTimeout to 1h to prevent premature GC Jan 13, 2026

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 14, 2026

k8s-ci-robot assigned ahg-g Jan 14, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 14, 2026

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 14, 2026

LukeAVanDrie mentioned this pull request Jan 14, 2026

[Flow Control] Roadmap #2152

Open

kfswain merged commit 6f54218 into kubernetes-sigs:main Jan 14, 2026
12 of 17 checks passed

k8s-infra-cherrypick-robot mentioned this pull request Jan 14, 2026

[release-1.3] Increase default FlowGCTimeout to 1h to prevent premature GC #2154

Merged

aishukamal mentioned this pull request Jan 14, 2026

fix: [Flow Control]: Optionally disable endpoint subset filtering while dispatching requests #2126

Merged

Conversation

LukeAVanDrie commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Jan 13, 2026

👷 Deploy Preview for gateway-api-inference-extension processing.

Uh oh!

k8s-ci-robot commented Jan 13, 2026

Uh oh!

LukeAVanDrie commented Jan 13, 2026

Uh oh!

k8s-ci-robot commented Jan 13, 2026

Uh oh!

LukeAVanDrie commented Jan 13, 2026

Uh oh!

aishukamal commented Jan 14, 2026

Uh oh!

k8s-ci-robot commented Jan 14, 2026

Uh oh!

ahg-g commented Jan 14, 2026

Uh oh!

ahg-g commented Jan 14, 2026

Uh oh!

k8s-ci-robot commented Jan 14, 2026

Uh oh!

LukeAVanDrie commented Jan 14, 2026

Uh oh!

kfswain commented Jan 14, 2026

Uh oh!

kfswain commented Jan 14, 2026

Uh oh!

k8s-infra-cherrypick-robot commented Jan 14, 2026

Uh oh!

kfswain commented Jan 14, 2026

Uh oh!

Uh oh!

k8s-infra-cherrypick-robot commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

LukeAVanDrie commented Jan 13, 2026 •

edited

Loading