Race condition in Global Concurrency Limits - recommended protection strategy? #20520
Replies: 1 comment
-
|
This is a real race condition in the HTTP-based GCL path - your analysis is correct. The Until this is fixed upstream, two practical workarounds:
Option 1 is simpler if your concurrency boundary aligns with deployments. Option 2 if you need cross-deployment tenant isolation. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Environment
Situation
We have multiple long-running workflow deployments for each tenant that must NOT run concurrently due to:
We use Global Concurrency Limits (GCL) with
limit=1to enforce this:Problem
We observed two flows simultaneously acquired slots from the same GCL despite
limit=1.Timeline from PostgreSQL
flow_runtable:When Flow A released its slot at
18:35:56, both Flow B and Flow C (which had been waiting) captured the slot simultaneously, violatinglimit=1.Database evidence:
Analysis
Looking at Prefect source code:
1. Orchestration rules (deployment/task concurrency) - PROTECTED:
bulk_increment_active_slotstest_concurrent_reacquisition_only_one_succeedsvalidate protection2. HTTP API (
/v2/concurrency_limits/increment-with-lease) - NOT PROTECTED:active_slots=0before any commitsSELECT FOR UPDATEor similar lockingQuestions
Is this expected behavior? Should Global Concurrency Limits be considered "best-effort" rather than strict guarantees?
What's the recommended approach for strict enforcement across multiple deployments for the same resource/tenant?
Should we use deployment-level concurrency instead? Our use case: multiple deployment types per tenant, need coordination between them.
Is atomic protection planned for HTTP API-based GCL, or is there architectural reason not to add it?
Beta Was this translation helpful? Give feedback.
All reactions