Skip to content

registry: switch to fine-grained leasing for flow lifecycle#2127

Merged
k8s-ci-robot merged 3 commits intokubernetes-sigs:mainfrom
LukeAVanDrie:refactor/registry-optimistic-flow-gc
Jan 14, 2026
Merged

registry: switch to fine-grained leasing for flow lifecycle#2127
k8s-ci-robot merged 3 commits intokubernetes-sigs:mainfrom
LukeAVanDrie:refactor/registry-optimistic-flow-gc

Conversation

@LukeAVanDrie
Copy link
Copy Markdown
Contributor

@LukeAVanDrie LukeAVanDrie commented Jan 12, 2026

What type of PR is this?
/kind bug
/kind cleanup

What this PR does / why we need it:

This PR refactors the FlowRegistry concurrency model, shifting from a rigid, sharded locking hierarchy to a decoupled, fine-grained leasing model for flow lifecycle management.

Context

The previous architecture relied on a brittle lock hierarchy (Registry -> Shard -> Flow). This coupling made it difficult to safely implement Garbage Collection for parent resources (like Priority Bands) without risking deadlocks. Additionally, the previous GC logic relied on a mutex held during the entire distribution phase. This made it impossible to hold a reference during the (potentially long) queueing phase without blocking the GC completely.

Changes

  1. Fine-Grained Concurrency: Replaced the legacy sharded locking approach with optimistic discovery (sync.Map) paired with fine-grained per-flow locking (sync.Mutex). This isolates contention to individual flows, allowing the hot path (WithConnection) to proceed without blocking on shard-level or global locks.
  2. Race-Resilient Lifecycle: Implemented a "Tombstone" pattern (markedForDeletion) and optimistic retry loops. This allows the system to safely detect and recover from race conditions (such as a flow being deleted by GC while a request is attempting to acquire it) without requiring "stop-the-world" pauses.
  3. Safe Long-Lived Leases: Flows are now protected by an explicit lease counter guarded by the per-flow mutex. This removes the blocking limitation, enabling the controller to hold flow references for the entire request duration (including queueing) without stalling the Garbage Collector (a prerequisite for fixing Rare Race Condition: Premature Flow GC causes Orphaned Queues and Request Starvation #1982).
  4. Decoupling: Flow GC is now decoupled from Shard locking. The Registry checks logical idleness before attempting physical cleanup, reducing the risk of deadlocks during topology changes.

Note: This is a surgical fix intended to stabilize the flow control layer and unblock future work (like Band GC). While we are currently evaluating the performance profile of the sharded architecture, this change improves maintainability and correctness in the immediate term without requiring a full structural rewrite.

Which issue(s) this PR fixes:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jan 12, 2026
@netlify
Copy link
Copy Markdown

netlify bot commented Jan 12, 2026

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 3ec5d7e
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6966cbbe840cb6000752e10d
😎 Deploy Preview https://deploy-preview-2127--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 12, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 12, 2026
@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

/cc @evacchi

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@LukeAVanDrie: GitHub didn't allow me to request PR reviews from the following users: evacchi.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cc @evacchi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

LukeAVanDrie added a commit to LukeAVanDrie/gateway-api-inference-extension that referenced this pull request Jan 12, 2026
This commit refactors the FlowController's request lifecycle management
to hold the Flow Registry lease (`WithConnection`) for the entire
duration of the request, including the queueing phase.

Previously, the lease was only held during the instantaneous
distribution phase. If a flow had requests waiting in a queue (e.g.,
during scale-from-zero) but no new incoming traffic, the registry would
incorrectly identify the flow as Idle and garbage collect it, orphaning
the queued requests.

Changes:
- Hoisted `WithConnection` in `EnqueueAndWait` to wrap the retry loop
  and `awaitFinalization`.
- Updated `ActiveFlowConnection` interface to expose `FlowKey()`,
  preventing data clumps in internal signatures.
- Refactored `selectDistributionCandidates` to use the active connection
  instead of re-acquiring it.
- Added a regression test (`Regression_LeaseHeldDuringQueueing`)
  ensuring the lease remains valid while the processor blocks.

This change relies on the optimistic concurrency model introduced in PR
 kubernetes-sigs#2127 to ensure that holding long-lived leases does not block the
Garbage Collector or cause writer starvation.
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Jan 13, 2026

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 13, 2026
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Jan 13, 2026

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 13, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 13, 2026
@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

/hold

Want to address @evacchi 's feedback and double check one more condition.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 13, 2026
Replaces the `flowState.gcLock` mutex with an atomic reference counting
mechanism ("leasing") and optimistic concurrency loop.

This change decouples flow lifecycle management from request processing,
solving two critical issues:
1. Deadlock Potential: Removes the lock hierarchy between Registry,
	 Shards, and Flows, eliminating the risk of deadlocks during complex
	 GC or scaling operations.
2. Prereq for Request Starvation (Issue kubernetes-sigs#1982): Allows long-running
   queued requests to maintain a "lease" on a flow without holding a
	 blocking mutex, preventing the GC from orphaning active queues.

The `WithConnection` hot path is now lock-free, using a `sync.Map` and
atomic CAS loop to pin flow state objects safely.

Updates:
- `pkg/epp/flowcontrol/registry`: Switch to optimistic model.
- `pkg/epp/flowcontrol/registry`: Update tests to verify atomic behavior.
Switches flowState internal management from independent atomics to a
`sync.Mutex`. This ensures state transitions (lease updates and
timestamp clearing) are atomic, simplifying the logic and preventing
inconsistent states.

Introduces a `markedForDeletion` flag to fix the stalled GC race, where
a request could acquire a flow that the GC had already decided to
delete.

Hardens cleanupFlowResources by using a Write Lock and re-checking the
map. This prevents the physical deletion of queues for flows that were
resurrected (JIT provisioned) during the cleanup phase.
@LukeAVanDrie LukeAVanDrie force-pushed the refactor/registry-optimistic-flow-gc branch from 3fa82e3 to 1814907 Compare January 13, 2026 22:34
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 13, 2026
Adds a targeted test case `ShouldBackOff_WhenFlowIsMarkedForDeletion_ButStillInMap`
to verify the robustness of the flow acquisition retry loop.

This ensures that if a request encounters a flow object that is
logically deleted (marked) but physically present in the map, it
correctly backs off and waits for a fresh object rather than
resurrecting the dead one.
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 13, 2026
Copy link
Copy Markdown
Contributor

@aishukamal aishukamal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the race! The changes look good to me but just want to point out that with this fix, we are going back to pessimistic concurrency control (use of locks), so I suggest updating the PR title to reflect that.

@LukeAVanDrie LukeAVanDrie changed the title registry: switch to optimistic concurrency for flow lifecycle registry: switch to fine-grained leasing for flow lifecycle Jan 14, 2026
@LukeAVanDrie
Copy link
Copy Markdown
Contributor Author

/remove-hold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 14, 2026
@ahg-g
Copy link
Copy Markdown
Contributor

ahg-g commented Jan 14, 2026

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 14, 2026
@k8s-ci-robot k8s-ci-robot merged commit ecd30fa into kubernetes-sigs:main Jan 14, 2026
15 checks passed
LukeAVanDrie added a commit to LukeAVanDrie/gateway-api-inference-extension that referenced this pull request Jan 16, 2026
This commit refactors the FlowController's request lifecycle management
to hold the Flow Registry lease (`WithConnection`) for the entire
duration of the request, including the queueing phase.

Previously, the lease was only held during the instantaneous
distribution phase. If a flow had requests waiting in a queue (e.g.,
during scale-from-zero) but no new incoming traffic, the registry would
incorrectly identify the flow as Idle and garbage collect it, orphaning
the queued requests.

Changes:
- Hoisted `WithConnection` in `EnqueueAndWait` to wrap the retry loop
  and `awaitFinalization`.
- Updated `ActiveFlowConnection` interface to expose `FlowKey()`,
  preventing data clumps in internal signatures.
- Refactored `selectDistributionCandidates` to use the active connection
  instead of re-acquiring it.
- Added a regression test (`Regression_LeaseHeldDuringQueueing`)
  ensuring the lease remains valid while the processor blocks.

This change relies on the optimistic concurrency model introduced in PR
 kubernetes-sigs#2127 to ensure that holding long-lived leases does not block the
Garbage Collector or cause writer starvation.
k8s-ci-robot pushed a commit that referenced this pull request Jan 21, 2026
This commit refactors the FlowController's request lifecycle management
to hold the Flow Registry lease (`WithConnection`) for the entire
duration of the request, including the queueing phase.

Previously, the lease was only held during the instantaneous
distribution phase. If a flow had requests waiting in a queue (e.g.,
during scale-from-zero) but no new incoming traffic, the registry would
incorrectly identify the flow as Idle and garbage collect it, orphaning
the queued requests.

Changes:
- Hoisted `WithConnection` in `EnqueueAndWait` to wrap the retry loop
  and `awaitFinalization`.
- Updated `ActiveFlowConnection` interface to expose `FlowKey()`,
  preventing data clumps in internal signatures.
- Refactored `selectDistributionCandidates` to use the active connection
  instead of re-acquiring it.
- Added a regression test (`Regression_LeaseHeldDuringQueueing`)
  ensuring the lease remains valid while the processor blocks.

This change relies on the optimistic concurrency model introduced in PR
 #2127 to ensure that holding long-lived leases does not block the
Garbage Collector or cause writer starvation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants