Skip to content

Commit df8ae8d

Browse files
committed
fix(secrets): close CAS-loop races in filesystem store consume paths
Two HIGH-severity findings on PR #3679. Both sites read a versioned entry, validated a one-shot/use-limit condition, then wrote back with `CasExpectation::Any`. The process-local mutex only serializes writers inside one process; multi-process callers sharing the same backend root could both pass the check and overwrite each other. - `FilesystemSecretStore::consume` — two consumers could both observe an Active one-shot lease, both decrypt, and both overwrite the consumed marker. - `FilesystemCredentialBroker::consume_session_use` — two consumers could both pass the max-uses check at `uses=N-1` and overwrite each other's increment, losing a use. Both now use the canonical retry-on-`FilesystemError::VersionMismatch` pattern from `ironclaw_engine::store::filesystem::update_thread_state` (post-`e2530adff`): re-read, re-evaluate the consume/use-limit condition, write with `CasExpectation::Version(versioned.version)`. A shared `CAS_RETRY_ATTEMPTS = 3` constant bounds the loop; exhausting it surfaces a transient backend error rather than papering over pathological hot-spots. Also annotated `leases_for_scope` with a `TODO(perf)` covering the N+1 list+get fan-out — bounded today by the owner-prefix path layout and short lease TTLs; replacing it with `Filter::Eq` over `query` requires the secrets store to declare its first index, which is a follow-up. Regression coverage: two new tests wrap `InMemoryBackend` with a `VersionRacingBackend` that bumps the watched path's version out-of-band on the first versioned `put`, forcing a `VersionMismatch` and exercising the retry loop. They also assert that the retried CAS write actually persisted (the next consume hits LeaseConsumed; the next three increments exhaust the max-uses budget).
1 parent e02cc8c commit df8ae8d

1 file changed

Lines changed: 356 additions & 60 deletions

File tree

0 commit comments

Comments
 (0)