Commit df8ae8d
committed
fix(secrets): close CAS-loop races in filesystem store consume paths
Two HIGH-severity findings on PR #3679. Both sites read a versioned
entry, validated a one-shot/use-limit condition, then wrote back with
`CasExpectation::Any`. The process-local mutex only serializes writers
inside one process; multi-process callers sharing the same backend root
could both pass the check and overwrite each other.
- `FilesystemSecretStore::consume` — two consumers could both observe an
Active one-shot lease, both decrypt, and both overwrite the consumed
marker.
- `FilesystemCredentialBroker::consume_session_use` — two consumers
could both pass the max-uses check at `uses=N-1` and overwrite each
other's increment, losing a use.
Both now use the canonical retry-on-`FilesystemError::VersionMismatch`
pattern from `ironclaw_engine::store::filesystem::update_thread_state`
(post-`e2530adff`): re-read, re-evaluate the consume/use-limit
condition, write with `CasExpectation::Version(versioned.version)`. A
shared `CAS_RETRY_ATTEMPTS = 3` constant bounds the loop; exhausting it
surfaces a transient backend error rather than papering over
pathological hot-spots.
Also annotated `leases_for_scope` with a `TODO(perf)` covering the
N+1 list+get fan-out — bounded today by the owner-prefix path layout
and short lease TTLs; replacing it with `Filter::Eq` over `query`
requires the secrets store to declare its first index, which is a
follow-up.
Regression coverage: two new tests wrap `InMemoryBackend` with a
`VersionRacingBackend` that bumps the watched path's version
out-of-band on the first versioned `put`, forcing a `VersionMismatch`
and exercising the retry loop. They also assert that the retried CAS
write actually persisted (the next consume hits LeaseConsumed; the next
three increments exhaust the max-uses budget).1 parent e02cc8c commit df8ae8d
1 file changed
Lines changed: 356 additions & 60 deletions
0 commit comments