Skip to content

feat(reborn): apply universal FS dispatch across consumer crates#3679

Open
ilblackdragon wants to merge 95 commits into
reborn-integrationfrom
reborn/fs-rework-unified
Open

feat(reborn): apply universal FS dispatch across consumer crates#3679
ilblackdragon wants to merge 95 commits into
reborn-integrationfrom
reborn/fs-rework-unified

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon commented May 15, 2026

Single PR stacking ALL the consumer-crate work that applies #3659 (universal RootFilesystem dispatch fabric) across the codebase. Supersedes the prior stacked PRs (#3666, #3670, #3671, #3672, #3678 — all closed).

What's in here

13 commits, +15,214 / -929 LOC across 61 files. Every commit is independently reviewable.

# Commit Scope
1 feat(processes) FilesystemProcessStore on the unified put/get surface
2 feat(outbound) FilesystemOutboundStateStore on the unified surface
3 feat(authorization) FilesystemCapabilityLeaseStore through unified put/get
4 feat(run-state) FilesystemRunStateStore + FilesystemApprovalRequestStore
5 refactor(workspace) Dissolve ironclaw_storage (660 LOC, parallel BlobStore/RecordStore traits removed)
6 feat(filesystem) HsmBackend placeholder + scope database.md to legacy
7 feat(reborn) Native append/tail on libsql + postgres (V30 migration); FilesystemDurableEventLog
8 feat(secrets) FilesystemSecretStore + FilesystemCredentialBroker
9 feat(filesystem,memory) IndexKind::Fts + IndexKind::Vector on libsql + postgres; FilesystemMemoryDocumentRepository
10 feat(db) conversations+jobs FilesystemConversationStore + FilesystemJobStore (~1700 LOC + 23 tests)
11 feat(db) sandbox+routine+toolfail FilesystemSandboxStore + FilesystemRoutineStore + FilesystemToolFailureStore (~2000 LOC + 20 tests)
12 feat(engine) FilesystemStore — full 33-method engine Store impl on the unified surface (~700 LOC + 24 tests; HybridStore untouched)
13 feat(db) settings+workspace+rest FilesystemSettingsStore + FilesystemUsersStore + FilesystemChannelPairingStore + FilesystemIdentityStore + FilesystemWorkspaceStore (split across 5 sub-modules; ~4400 LOC + 47 tests)

Coverage on the universal-FS surface

  • Bytes plane: ✅ every consumer routes through put(Entry::bytes(...))/get
  • Records plane: ✅ every consumer writes Entry::record(...) with indexed projection
  • Query plane: ✅ all consumers use Filter::Eq/PrefixOn/And/Or/Fts/VectorNearest
  • Index plane: ✅ ensure_index declares Exact/Prefix at startup; FTS5 + tsvector + cosine vector advertised by libsql + postgres
  • Event plane: ✅ libsql + postgres natively serve append/tail; FilesystemDurableEventLog + sandbox job-events use it
  • CAS: ✅ every record store uses CasExpectation::Version with retry on VersionMismatch
  • Capability declaration + mount-time validation: ✅ BackendCapabilities bitmask; descriptor over-claims surface FilesystemError::DescriptorOverclaims

What's still pending (intentional follow-ups)

  • Composite Database trait dissolution: each of the 10 sub-traits now has a Filesystem*Store facade in src/db/filesystem_*.rs, but the composite trait itself in src/db/mod.rs is unchanged — both legacy SQL impls and new FS-backed impls coexist. Final removal of the composite trait is intentionally deferred behind the wiring switch (out of scope here).
  • Host wiring switch: src/app.rs still constructs HybridStore against the legacy Database impl. Switching to CompositeRootFilesystem-backed wiring is a single-PR follow-up.
  • EncryptedBackend decorator: encryption stays embedded in SecretsCrypto for the migration window; documented TODO to move to a backend decorator.
  • Real HSM backend: HsmBackend is a placeholder demonstrating the seam.

Verification

  • cargo fmt --check clean
  • cargo check --workspace --all-features clean
  • cargo clippy --workspace --all-features --tests -- -D clippy::correctness clean
  • cargo test per migrated crate — all green; engine: 549 tests; secrets: 60; memory: 219 (2 pre-existing e2e_hybrid_search failures unrelated); event_store: 57; architecture: 14; workspace lib db::filesystem 86 tests
  • Postgres-tier tests gracefully skip when no DB reachable; will run in integration CI

Migration pattern (canonical, used by every commit)

pub struct FilesystemFooStore<F: RootFilesystem> {
    fs: Arc<F>,
}

impl<F: RootFilesystem> FooStore for FilesystemFooStore<F> {
    async fn upsert(&self, foo: &Foo) -> Result<(), FooError> {
        let entry = Entry::record(RecordKind::new("foo")?, &foo.to_json()?)?
            .with_indexed(IndexKey::new("status")?, IndexValue::Text(foo.status_str().into()))
            .with_indexed(IndexKey::new("user_id")?, IndexValue::Text(foo.user_id.clone()));
        self.fs.put(&foo_path(&foo.id), entry, CasExpectation::Version(foo.version)).await?;
        Ok(())
    }
}

Path layout: /<root>/<scope>/<id> for entities, /<root>/<scope>/<id>/<child>/<child_id> for sub-entities. Indexed projections + Filter::Eq + And for queries. CAS retry on VersionMismatch.

Test plan

  • cargo fmt --check
  • cargo check --workspace --all-features
  • cargo clippy --workspace --all-features --tests -- -D clippy::correctness
  • Per-crate tests for every migrated crate (filesystem, secrets, event_store, memory, processes, authorization, run_state, outbound, engine, architecture)
  • Workspace lib db::filesystem tests (86 pass)
  • Full CI run (pending)
  • Postgres-tier integration CI (gracefully skipped locally)

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates persistence logic across several crates to a unified RootFilesystem interface, effectively replacing the ironclaw_storage crate. The changes introduce append and tail operations for durable logging and implement Full-Text Search and Vector similarity search for the in-memory, libSQL, and PostgreSQL backends. New filesystem-backed repositories are added for memory documents, outbound state, processes, and secrets. Feedback identifies critical performance and concurrency issues, including O(N) bottlenecks in event log head probing, TOCTOU race conditions in secret and session consumption, and inefficient N+1 query patterns during directory listings that require optimization through indexing or targeted queries.

Comment on lines +128 to +135
let head = self
.fs
.tail(&path, SeqNo::from_backend(0))
.await
.map_err(map_filesystem_tail_error)?
.last()
.map(|record| record.seq.get())
.unwrap_or(0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using tail to probe the stream head is a significant performance risk as it fetches the entire event log into memory. To avoid O(N) performance bottlenecks in append operations for durable logs, maintain a cursor cache or perform bounded reverse scans to read only the necessary tail records.

References
  1. Avoid O(N) performance bottlenecks in append operations for durable logs by maintaining a cursor cache or performing bounded reverse scans to read only the necessary tail records.

let material = SecretMaterial::from(decrypted.expose().to_string());

lease.status = SecretLeaseStatus::Consumed;
self.write_lease(&lease).await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The consume operation lacks atomicity across multiple processes, creating a time-of-check-to-time-of-use (TOCTOU) race condition. To prevent this, re-verify the condition (lease status) after acquiring the write lock or use the record version for optimistic concurrency control (CAS) to ensure only one update succeeds.

References
  1. To prevent a time-of-check-to-time-of-use (TOCTOU) race condition when checking a condition under a read lock before performing a write, re-verify the condition after acquiring the write lock but before executing the write.

let entry = Entry::bytes(body).with_content_type(ContentType::json());
self.filesystem
.put(&path, entry, CasExpectation::Any)
.await
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

consume_session_use suffers from a TOCTOU race condition in multi-process deployments. Concurrent updates will result in lost increments. This operation must use the record version for optimistic concurrency control or re-verify the state after acquiring a write lock to ensure atomicity.

References
  1. To prevent a time-of-check-to-time-of-use (TOCTOU) race condition when checking a condition under a read lock before performing a write, re-verify the condition after acquiring the write lock but before executing the write.

Comment on lines +211 to +224
for entry in entries {
if !entry.name.ends_with(".json") {
continue;
}
let Some(attempt) = self
.get_json::<OutboundDeliveryAttempt>(&entry.path)
.await?
else {
continue;
};
if scope_matches(&attempt.scope, &scope) {
deliveries.push(attempt);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop performs an N+1 query pattern by calling get for every entry. To optimize list operations that involve scanning multiple directories and files, consider maintaining an index file that directly lists the paths to the relevant items, or use targeted database queries to fetch specific records instead of loading all records and filtering in the application.

References
  1. To optimize list operations that involve scanning multiple directories and files, maintain an index file that directly lists the paths to the relevant items.
  2. Use targeted database queries to fetch specific records instead of loading all records and filtering in the application.

Comment on lines 226 to 243
for entry in entries {
if entry.name.ends_with(".json") {
let bytes = self.filesystem.as_ref().read_file(&entry.path).await?;
let record = deserialize::<ProcessRecord>(&bytes)?;
// Reviewer (PR #3666) flagged: a `get` returning `None` after
// `list_dir` enumerated the path indicates a race or backend
// inconsistency. Returning a partial process list silently
// hides this; surface it as `NotFound` so callers see the
// same failure shape they got with the legacy `read_file`
// path.
let Some(versioned) = self.filesystem.as_ref().get(&entry.path).await? else {
return Err(ProcessError::Filesystem(format!(
"process record listed but missing at {}",
entry.path
)));
};
let record = deserialize::<ProcessRecord>(&versioned.entry.body)?;
if same_scope_owner(&record.scope, scope) {
records.push(record);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list method implements an N+1 pattern. This should be optimized by maintaining an index file that directly lists the paths to the relevant items, avoiding the overhead of multiple get calls during directory traversal.

References
  1. To optimize list operations that involve scanning multiple directories and files, maintain an index file that directly lists the paths to the relevant items.

Comment on lines +454 to +473
let mut leases = Vec::new();
for entry in entries {
if !entry.name.ends_with(".json") {
continue;
}
let Some(versioned) = self
.filesystem
.get(&entry.path)
.await
.map_err(fs_to_secret_store_error)?
else {
continue;
};
let mut stored: StoredLease = deserialize_secret(&versioned.entry.body)?;
if !same_scope_for_lease(&stored.scope, scope) {
continue;
}
stored.status = Self::effective_status(&stored, now);
leases.push(Self::lease_to_public(&stored));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

leases_for_scope performs an N+1 query pattern. To optimize this, use targeted database queries to fetch specific records in a single operation, or maintain an index file to avoid costly directory traversal.

References
  1. Use targeted database queries to fetch specific records instead of loading all records and filtering in the application.
  2. To optimize list operations that involve scanning multiple directories and files, maintain an index file that directly lists the paths to the relevant items.

@github-actions github-actions Bot added size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules scope: db Database trait / abstraction scope: db/postgres PostgreSQL backend scope: docs Documentation scope: dependencies Dependency updates DB MIGRATION PR adds or modifies PostgreSQL or libSQL migration definitions contributor: core 20+ merged PRs and removed risk: medium Business logic, config, or moderate-risk modules labels May 15, 2026
ilblackdragon added a commit that referenced this pull request May 15, 2026
Three changes that close out the remaining HIGH-severity feedback from
the self-review (#1 #2 #3 #4 already addressed in 990c4f7 + e2530ad):

**#2 — `parse_state` silent fallback to Pending removed.**
`src/db/filesystem_jobs.rs::parse_state` previously mapped unknown
status strings to `JobState::Pending`, masking schema drift across a
rollout (a new state value appearing in stored rows would silently
lose its true value). Now returns `Result<JobState, DatabaseError>`
and the single caller propagates with `?`. Matches the wire-stable
enums rule in `types.md`.

**#6 — `is_engine_unsupported` no longer substring-matches.**
`crates/ironclaw_engine/src/store/filesystem.rs`: the typed
`FilesystemError::Unsupported` discriminator gets lost when wrapped in
`EngineError::Store { reason: String }`, so the old check
`reason.contains("Unsupported")` would false-positive on any unrelated
store error that mentioned the word. Now `fs_to_engine_error` tags the
discriminator with a stable `[fs:unsupported]` sentinel and the check
matches that sentinel — discriminator-preserving without changing the
public `EngineError` shape.

**#7 — `FilesystemChannelPairingStore` no longer drops corrupted records.**
`src/db/filesystem_pairing.rs::find_pending_requests`: the old code
silently filtered records whose JSON failed to deserialize, hiding
data corruption. Now propagates `DatabaseError::Serialization` with
the stored path so the operator sees the failure.

Also: `// silent-ok:` annotations added to the three engine `Store`
sites where read-modify-write on unknown ids is intentionally a no-op
(matches HybridStore parity per its CLAUDE.md). Each annotation names
the legacy contract being preserved.

Verification: `cargo check --workspace --all-features` clean;
`cargo test -p ironclaw_engine --all-features` 549/549;
`cargo test --lib --all-features db::filesystem` 88/88;
`cargo fmt --check` clean.
@ilblackdragon
Copy link
Copy Markdown
Member Author

Self-review round done. Spawned a code-reviewer agent against the integrated branch; it returned 7 HIGH / 8 MEDIUM / 7 LOW findings. The HIGH-severity fixes are now in:

# Issue Fix commit
H1 hybrid_search scanned every chunk in memory instead of using the native FTS/Vector indexes the libsql + postgres backends advertise `990c4f73e` — rewrote FilesystemWorkspaceStore::hybrid_search to call Filter::Fts / Filter::VectorNearest with a graceful fallback to the scan path on FilesystemError::Unsupported; declared FTS+Vector indexes lazily via OnceCell-gated ensure_chunk_indexes
H2 parse_state mapped unknown status strings silently to JobState::Pending, masking schema drift `9c292f9d0` — returns Result<JobState, DatabaseError>; call site propagates with ?
H3 libsql FTS5 trigger DDL spliced path literals — single quotes allowed by VirtualPath::new could inject SQL `e2530adff` — DDL emission site rejects any char outside [A-Za-z0-9_/.-]; regression test asserts path-LIKE-injection paths are refused at ensure_index time
H4 update_user_status/update_thread_state/revoke_lease/etc. used CasExpectation::Any — lost concurrent updates `e2530adff` — canonical CAS-Version + retry-on-VersionMismatch loop applied to filesystem_users, filesystem_jobs (update_job_status, mark_job_stuck), and the engine FilesystemStore (update_thread_state, revoke_lease, update_mission_status)
H5 Engine store silently returned Ok(()) on unknown ids without annotation `9c292f9d0` — added explicit // silent-ok: annotations naming HybridStore-parity at each let Some(versioned) = ... else { return; } site
H6 is_engine_unsupported substring-matched the error message; would false-positive on unrelated errors mentioning "Unsupported" `9c292f9d0` — fs_to_engine_error now tags the typed FilesystemError::Unsupported discriminator with a stable [fs:unsupported] sentinel; check matches that sentinel
H7 FilesystemChannelPairingStore::find_pending_requests silently dropped corrupted records `9c292f9d0` — propagates DatabaseError::Serialization with the offending stored path

Also (peripheral but related): vector "nearest" in libsql + postgres no longer loads every row's body to rank — first stage selects path, indexed, version, ranks by cosine, then get() materializes only the top-k bodies.

MEDIUM/LOW findings (N-scan operations missing indexes, file-size budget violations, etc.) are tracked as follow-ups; none affect correctness.

Test counts after the fix round (local, --all-features):

  • ironclaw_filesystem: 101 pass
  • ironclaw_engine: 549 pass (525 lib + 24 FilesystemStore tests)
  • db::filesystem (workspace lib): 88 pass
  • ironclaw_architecture: 14 pass
  • cargo clippy --workspace --all-features --tests -- -D clippy::correctness: clean
  • cargo fmt --check: clean

CI now running on the integrated branch (run 25911695237).

@serrrfirat
Copy link
Copy Markdown
Collaborator

Codex review found several actionable correctness/security issues in the filesystem-backed stores:

  1. P1: Secret lease consumption needs CAS (crates/ironclaw_secrets/src/filesystem_store.rs)
    consume reads an active lease, decrypts the secret, then writes the consumed state with CasExpectation::Any. The process-local lock does not protect shared filesystem backends across processes, so two consumers can both consume a one-time lease. Preserve the lease RecordVersion from the read and commit the consumed transition with CasExpectation::Version (or fail/retry on mismatch).

  2. P1: Credential session use increments need CAS (crates/ironclaw_secrets/src/filesystem_store.rs)
    consume_session_use validates stored.uses, increments it, then writes with CasExpectation::Any. Concurrent processes can both pass max_uses checks and overwrite each other. Write against versioned.version and reject/retry on VersionMismatch.

  3. P2: Conversation message queries need pagination (src/db/filesystem_conversations.rs)
    list_messages_internal fetches only Page::MAX_LIMIT rows before sorting/paginating in memory. Conversations with more than 1024 messages are silently truncated and has_more can be computed from an incomplete set. Loop through pages until a short page, or push caller pagination into the filesystem query.

  4. P2: Filesystem job queries need pagination (src/db/filesystem_jobs.rs)
    The shared run_query helper fetches only one max page. Scopes with more than 1024 jobs/actions/estimation records can miss existing rows, affecting listings, summaries, and updates. Mirror the paged loop used by the engine store.

  5. P2: Settings path encoding is lossy (src/db/filesystem_settings.rs)
    encode_segment maps unsafe characters to _, so distinct keys like a/b and a_b collide under the same path and can overwrite each other. Use a reversible/collision-free encoding such as percent-encoding or base64-url.

  6. P2: SQL index names need collision resistance (crates/ironclaw_filesystem/src/db.rs)
    sql_index_name truncates long identifiers without adding a hash. Distinct long (prefix, name) specs can produce the same DDL object name, and CREATE ... IF NOT EXISTS can silently reuse the wrong backing index/table/triggers. Include a stable hash suffix before truncation.

  7. P2: In-memory backend should reject writes over implicit directories (crates/ironclaw_filesystem/src/in_memory.rs)
    InMemoryBackend::put allows writing /memory/a while /memory/a/b exists, producing a tree shape SQL backends reject via has_child_entry. Add the same descendant check so tests cannot pass against impossible production state.

ilblackdragon added a commit that referenced this pull request May 15, 2026
`list_messages_internal`, `list_conversations_summary`, and `run_query`
each called `filesystem.query(.., Page::new(0, Page::MAX_LIMIT))` exactly
once and trusted the result was complete. Because `Page::MAX_LIMIT ==
1024`, conversations with >1024 messages or scopes with >1024
jobs/actions/estimations silently lost every row past the cap, and the
`has_more` flag in `list_conversation_messages_paginated` became
meaningless once the dropped tail crossed the page boundary. Codex PR
#3679 P2 review flagged the pattern.

Extract a shared `query_all_pages` helper in `filesystem_conversations`
that loops `query(..., Page::new(offset, MAX_LIMIT))` until a short
page comes back, then reuse it from `filesystem_jobs::run_query` and
from the inline scan in `update_estimation_actuals`. The helper
preserves the existing `NotFound -> Vec::new()` short-circuit and the
`fs_err_to_database` error mapping so call sites are otherwise
unchanged.

Regression tests:
- `list_messages_drains_pages_beyond_max_limit` writes
  `MAX_LIMIT + 5` messages and asserts the full count round-trips
  through `list_conversation_messages` and that
  `list_conversation_messages_paginated` reports `has_more` honestly
  for both partial and exhaustive windows.
- `get_job_actions_drains_pages_beyond_max_limit` writes
  `MAX_LIMIT + 3` actions on one job and asserts the full count comes
  back in sequence order.
- `list_agent_jobs_drains_pages_beyond_max_limit` writes
  `MAX_LIMIT + 7` jobs and asserts both `list_agent_jobs` and
  `agent_job_summary` count every row.
ilblackdragon added a commit that referenced this pull request May 15, 2026
Two HIGH-severity findings on PR #3679. Both sites read a versioned
entry, validated a one-shot/use-limit condition, then wrote back with
`CasExpectation::Any`. The process-local mutex only serializes writers
inside one process; multi-process callers sharing the same backend root
could both pass the check and overwrite each other.

- `FilesystemSecretStore::consume` — two consumers could both observe an
  Active one-shot lease, both decrypt, and both overwrite the consumed
  marker.
- `FilesystemCredentialBroker::consume_session_use` — two consumers
  could both pass the max-uses check at `uses=N-1` and overwrite each
  other's increment, losing a use.

Both now use the canonical retry-on-`FilesystemError::VersionMismatch`
pattern from `ironclaw_engine::store::filesystem::update_thread_state`
(post-`e2530adff`): re-read, re-evaluate the consume/use-limit
condition, write with `CasExpectation::Version(versioned.version)`. A
shared `CAS_RETRY_ATTEMPTS = 3` constant bounds the loop; exhausting it
surfaces a transient backend error rather than papering over
pathological hot-spots.

Also annotated `leases_for_scope` with a `TODO(perf)` covering the
N+1 list+get fan-out — bounded today by the owner-prefix path layout
and short lease TTLs; replacing it with `Filter::Eq` over `query`
requires the secrets store to declare its first index, which is a
follow-up.

Regression coverage: two new tests wrap `InMemoryBackend` with a
`VersionRacingBackend` that bumps the watched path's version
out-of-band on the first versioned `put`, forcing a `VersionMismatch`
and exercising the retry loop. They also assert that the retried CAS
write actually persisted (the next consume hits LeaseConsumed; the next
three increments exhaust the max-uses budget).
ilblackdragon added a commit that referenced this pull request May 15, 2026
Four P2 correctness fixes from the codex/gemini review.

**Settings keys use percent-encoding** (`src/db/filesystem_settings.rs`):
`encode_segment` previously mapped `/`, space, control chars, and others
all to `_`. Keys like `a/b` and `a_b` collided onto the same path and
silently overwrote each other. Now percent-encodes every byte outside
the unreserved set so distinct inputs map to distinct outputs.

**SQL index names get a blake3 suffix on overflow** (`crates/ironclaw_filesystem/src/db.rs`):
`sql_index_name` truncated identifiers exceeding 62 chars without
disambiguating, so two distinct long `(prefix, name)` specs could
collapse onto the same DDL object — `CREATE ... IF NOT EXISTS` would
silently reuse the wrong index/trigger. Now appends an 8-char blake3
hash suffix before truncating. Added `blake3 = "1"` to the crate's
deps (small + already used by other workspace crates).

**InMemoryBackend rejects writes over implicit directories**
(`crates/ironclaw_filesystem/src/in_memory.rs`): the SQL backends
refuse `put(/a)` when `/a/b` exists (treating `/a` as a directory).
The in-memory reference impl silently accepted those writes, letting
tests pass against production-impossible state. Mirror the SQL
contract.

**Event-store head-probe is bounded**
(`crates/ironclaw_reborn_event_store/src/filesystem_store.rs`):
The replay-gap detection previously called `tail(path, 0)` to read the
whole log just to look at its last seq — O(N) on every cold-path call.
Now probes `tail(path, after - 1)`: a non-empty result means
head == after (consumer is caught up); empty means head < after
(foreign-future cursor). Returns at most one record instead of the
entire log.

Verification: cargo check --workspace --all-features clean; cargo
test -p ironclaw_filesystem -p ironclaw_secrets
-p ironclaw_reborn_event_store --all-features all pass.
@ilblackdragon
Copy link
Copy Markdown
Member Author

Review-response summary — all HIGH-severity and P2-correctness findings from gemini-code-assist + codex review are now addressed.

Finding Severity Status Commit
Event-store tail(0) head probe O(N) on cold path (crates/ironclaw_reborn_event_store/src/filesystem_store.rs:135) HIGH (gemini) Fixed — bounded tail(after - 1) probe; returns ≤1 record instead of the whole log `0ebc22c83`
SecretStore::consume TOCTOU on multi-process (crates/ironclaw_secrets/src/filesystem_store.rs:412) HIGH (gemini) / P1 (codex) FixedCasExpectation::Version + 3-attempt retry loop; new filesystem_secret_store_consume_retries_on_version_mismatch regression test using a VersionRacingBackend `df8ae8d1c`
consume_session_use TOCTOU (crates/ironclaw_secrets/src/filesystem_store.rs:728) HIGH (gemini) / P1 (codex) Fixed — same CAS-retry pattern; regression test added `df8ae8d1c`
list_messages_internal truncates at Page::MAX_LIMIT (src/db/filesystem_conversations.rs) P2 (codex) Fixed — extracted query_all_pages drain helper; regression test writes MAX_LIMIT + 5 messages `e02cc8c4e`
run_query truncates jobs/actions/estimations at MAX_LIMIT (src/db/filesystem_jobs.rs) P2 (codex) Fixed — same drain helper; regression tests for actions and listings `e02cc8c4e`
encode_segment lossy _ substitution (src/db/filesystem_settings.rs) P2 (codex) Fixed — percent-encoding; encode_segment_is_collision_free test proves a/b and a_b no longer collide `0ebc22c83`
sql_index_name truncates without hash (crates/ironclaw_filesystem/src/db.rs) P2 (codex) Fixed — appends 8-char blake3 suffix before truncating `0ebc22c83`
InMemoryBackend::put allows write over implicit dir (crates/ironclaw_filesystem/src/in_memory.rs) P2 (codex) Fixed — descendant check matches SQL backends' contract `0ebc22c83`
leases_for_scope N+1 list (crates/ironclaw_secrets/src/filesystem_store.rs:473) MEDIUM (gemini) Partial — kept the scan; added TODO(perf) documenting bound (capped by per-owner path prefix + short lease TTL) and the index path that would replace it `df8ae8d1c`
filesystem_outbound::list N+1 pattern (line 224) MEDIUM (gemini) Deferred — same shape applies; the scan is bounded by per-tenant cardinality. Will fold into a follow-up perf pass once the host wires production traffic against the unified store
filesystem_processes::list N+1 pattern (line 243) MEDIUM (gemini) Deferred — same reasoning as above

Verification after the fix round:

  • cargo check --workspace --all-features clean
  • cargo test -p ironclaw_filesystem -p ironclaw_secrets -p ironclaw_reborn_event_store --all-features all pass
  • cargo test --lib --all-features db::filesystem — 91 pass (was 88, added 3 pagination + collision-free tests)
  • cargo clippy --workspace --all-features --tests -- -D clippy::correctness clean
  • cargo fmt --check clean

First consumer migration onto the new RootFilesystem surface. Switches
the byte-plane read_file/write_file calls inside ironclaw_processes'
filesystem-backed store to the unified put/get ops with Entry::bytes +
CasExpectation::Any. The on-disk JSON layout is unchanged, every
existing test passes, and downstream crates that construct
FilesystemProcessStore (ironclaw_host_runtime + tests) don't need to
change.

Scope deliberately narrow: opaque-file entries through `put`/`get`
without record kinds or non-`Any` CAS, since LocalFilesystem's native
`put` only accepts that shape (per the foundation PR #3659). Once
LocalFilesystem grows sidecar metadata, this consumer can switch to
`Entry::record(process_record_kind, ...)` + `CasExpectation::Absent`
without changing the on-disk layout.

Touch points:
- write_record uses put(Entry::bytes, CAS::Any)
- start uses get for the existence probe + transition_lock for the
  atomicity envelope per the single-instance invariant
- update_status / get / records_for_scope read via get and unwrap
  VersionedEntry.body
- records_for_scope returns ProcessError::Filesystem (not silent skip)
  when get returns None for a path that list_dir just yielded —
  matches the pre-migration NotFound propagation invariant

Test scaffold update: BackendErrorFilesystem now overrides `get` too,
so the fault-propagation regression test continues to exercise its
intended path. (Reviewer P1/P2 on the original #3666 — recursion +
silent-skip — addressed in foundation #3659 directly since LocalFilesystem
now ships native `put`/`get`.)
Stacked on the consolidated foundation PR #3659. Adds an
OutboundStateStore impl that persists outbound metadata under
/engine/outbound/{policies,subscriptions,deliveries} through any
RootFilesystem. The existing libSQL/Postgres/in-memory stores stay
intact during the migration; a follow-up cleanup PR can delete them
once production runs on the unified surface.

The new store passes the full contract suite (durable_policy_*,
subscription_cursor_*, delivery_status_*, notification_policy_*,
full_turn_scope_isolation) against InMemoryBackend in the existing
outbound_state_store_contract.rs test file.
…fied put/get

Stacked on PR #3670 (outbound). Mirrors the ironclaw_processes
migration in PR #3666 / now consolidated into #3659. Switches the
filesystem-backed lease store's read_file/write_file calls to the
unified get/put ops with Entry::bytes + CasExpectation::Any. The
on-disk JSON layout is unchanged, every existing test passes, and the
per-owner mutation_lock continues to serialize claim/consume/revoke
within a single instance.

Touch points:
- read_lease, read_lease_index, read_lease_file — now use get and
  unwrap VersionedEntry.body.
- write_lease, write_lease_index — now use put(Entry::bytes, Any).
- Imports updated.
- CountingFilesystem test scaffold gains put/get overrides that
  forward to its inner LocalFilesystem, since the trait defaults are
  now Unsupported after the PR #3659 recursion fix.
Stacked on PR #3671 (authorization). Mirrors processes (#3666) and
authorization (#3671) migrations. Switches all read_file/write_file
calls in FilesystemRunStateStore and FilesystemApprovalRequestStore
to the unified get/put ops with Entry::bytes + CasExpectation::Any.
On-disk JSON layout unchanged.

Test scaffold updates: ConcurrentMissingReadFilesystem and
DisappearingApprovalReadFilesystem gain put/get overrides that
forward to their inner LocalFilesystem and apply the same fault
injection logic on the unified read path (was: only on the legacy
read_file path). Required after the trait defaults moved to
Unsupported in PR #3659.
The ironclaw_storage crate predates the unified RootFilesystem surface
introduced by PR #3659 (universal FS dispatch). Its `BlobStore`/`RecordStore`
traits, `StorageKey`/`StorageVersion`/`PutCondition` types, and
`StoredBlob`/`StoredRecord` shapes parallel the new unified put/get
/CasExpectation/RecordVersion machinery on `RootFilesystem` — a textbook
duplicate-dispatch smell flagged by .claude/rules/architecture.md.

Only `ironclaw_outbound` consumed any of the crate, and only 5 small
helpers (`encode_json`, `decode_json`, `redacted_backend_error`,
`StorageError::Backend`, `ABSENT_SCOPE_COMPONENT`). All other types and
the entire `BlobStore`/`RecordStore` surface (660 LOC) were unused —
their intended consumers already moved to `RootFilesystem` directly.

Inlined the 5 helpers into `crates/ironclaw_outbound/src/db.rs`:
- `encode_json`/`decode_json` → direct `serde_json::to_string`/`from_str`
- `redacted_backend_error` → local log+collapse to `OutboundError::Backend`
  (preserves the redaction boundary required by ironclaw_outbound/CLAUDE.md)
- `ABSENT_SCOPE_COMPONENT` → local const ""

Removed the crate's workspace membership, the outbound dep, the
forbidden-edges BoundaryRule, and the crate directory.

Also updated the ironclaw_outbound BoundaryRule to permit a normal
dependency on `ironclaw_filesystem` — `FilesystemOutboundStateStore`
landed in the prior cascade PR and the boundary rule was stale.
…egacy

Two changes that close out the demoable parts of the universal-FS-dispatch
rework (tasks #18 and the demonstrable portion of #19 from the plan).

**HsmBackend placeholder** (`crates/ironclaw_filesystem/src/hsm.rs`).
Demonstrates that a new backend is a single-file change: implements the
one `RootFilesystem` trait, declares a restricted capability surface
(`Read` + `Write` + `Stat` + `Delete` + `TxnCapability::Cas` — no records,
no query, no index, no events, no multi-key transactions), and routes
`put`/`get`/`delete`/`stat`/`list_dir` through an in-process placeholder.

Five tests prove the seam works end-to-end:

- `hsm_supports_encrypted_bytes_round_trip` — bytes put/get works.
- `hsm_rejects_structured_records` — `put` with `RecordKind::Some` or
  non-empty `indexed` returns `Unsupported`, so a consumer cannot
  accidentally route records through encryption-only storage.
- `hsm_rejects_query_and_index_ops` — `query`/`ensure_index` return
  `Unsupported` consistent with the declared capabilities.
- `composite_rejects_overclaimed_hsm_descriptor` — mount-time
  validation (`validate_mount_capabilities`) refuses a descriptor that
  claims `Query`/`IndexExact` over a backend that doesn't deliver,
  failing with `FilesystemError::DescriptorOverclaims { missing, .. }`.
- `composite_routes_to_hsm_under_secrets_mount` — the acceptance gate:
  mounting HsmBackend at `/secrets` and routing put/get through the
  composite works with no consumer-visible changes. Indexed projection
  is still rejected because the declared capabilities advertise no
  index/query support.

A real HSM implementation replaces the in-memory placeholder with an
HSM session handle; the trait surface, capability declarations, and
mount-time validation are reusable as-is. The placeholder is *not* a
security boundary — it is a seam demonstration.

**database.md scoped to legacy directories**. The dual-backend rule
file (`.claude/rules/database.md`) is `paths`-scoped to `src/db/**`,
`src/history/**`, and `migrations/**` — exactly the legacy surface that
predates the universal FS dispatch. Added a "Status & Direction"
preamble pointing new persistence work at `ScopedFilesystem` and the
`2026-05-14-universal-fs-dispatch.md` plan, with the existing
per-crate dual-backend guidance kept (and tagged "legacy") for code
still inside those directories.
…ection

Decorate every `FilesystemCapabilityLeaseStore::write_lease_raw` and
`write_lease_index` `Entry` with a `tenant_id` indexed projection so an
admin-tier query can filter explicitly by tenant and a path-rewriting
bug surfaces as a query-time mismatch rather than silent cross-tenant
leakage. The projection is additive — `ScopedFilesystem` path-prefix
scoping remains the primary isolation boundary.

Adds:
- `index_key_tenant_id()` / `index_name_authorization_tenant()` helpers
  and `ensure_tenant_id_index` declared on the per-owner lease prefix
- Tenant projection on lease and lease-index entries
- Byte-only-backend fallback on both writes (strip the projection and
  downgrade CAS to `Any` when LocalFilesystem returns `Unsupported`),
  matching the sibling crates' fallback shape
- Regression test that issues a lease and asserts a positive
  `Filter::Eq` query (matching tenant) returns rows and a negative
  query (different tenant) returns zero rows

Refs docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md.
Clarify in the tenant-isolation plan's "What this gives us" section that
the `tenant_id` defense-in-depth projection covers processes/secrets/
outbound/authorization but deliberately excludes the engine `Store`. The
engine trait is single-tenant-by-construction per Open Question 2 — it
never sees `tenant_id` internally and has no scope source to drive the
projection at the write site, so it relies on path-prefix scoping alone.
…fs-rework-unified

# Conflicts:
#	crates/ironclaw_reborn/src/planned_driver.rs
…ement contract

The architecture boundary test
`reborn_virtual_roots_match_storage_placement_contract`
(`crates/ironclaw_architecture/tests/reborn_dependency_boundaries.rs`)
asserts that `ironclaw_host_api::VIRTUAL_ROOTS` matches the canonical
roots in `docs/reborn/contracts/storage-placement.md` and
`docs/reborn/contracts/filesystem.md`. The previous commit
(`b30c31eae`) added five consumer-store roots to `VIRTUAL_ROOTS`
without updating the contract docs and collapsed three `/system/*`
subroots into a single `/system` entry, which broke the test in CI.

- Restore `/system/settings`, `/system/extensions`, `/system/skills`
  in `VIRTUAL_ROOTS` (they are still load-bearing — used by
  `ironclaw_dispatcher` for extension discovery and mount roots).
- Keep the five new consumer-store roots (`/processes`,
  `/authorization`, `/outbound`, `/tenant-shared`, `/tenants`) and
  add matching rows to both contract docs explaining what each one
  carries.
- Split the unified `/system` mount alias in
  `default_singleton_mount_view` / `invocation_mount_view` into one
  alias per canonical subroot, since each subroot is the smallest
  VirtualPath unit reserved by the contract. Tests updated to match.

After this commit, `cargo test -p ironclaw_architecture --test
reborn_dependency_boundaries` passes and the composition mount-view
tests still pass.
These per-backend OutboundStateStore impls predate the universal-FS
dispatch design. Now that FilesystemOutboundStateStore routes through
RootFilesystem, the backend choice happens at the RootFilesystem
layer (LibSqlRootFilesystem vs PostgresRootFilesystem vs
InMemoryBackend) — no need for outbound-specific persistence impls.

- Delete src/libsql_store.rs (LibSqlOutboundStateStore)
- Delete src/postgres_store.rs (PostgresOutboundStateStore)
- Delete src/db.rs (helpers used only by the above)
- Delete the four contract tests that exercised them (libsql_persists*,
  libsql_rejects*, postgres_persists*, postgres_rejects*); durability
  across reopen is now a property of the RootFilesystem backend, not
  of outbound-specific persistence
- Inline-gate the OutboundPushKind/OutboundDeliveryStatus as_str
  helpers as #[allow(dead_code)] — they were only consumed by the
  deleted SQL bodies

No production callers; no composition rewiring needed (composition
already uses FilesystemOutboundStateStore).
… follow-ups

After deleting LibSql/PostgresOutboundStateStore (`d4bf7b3c2`), the
remaining cleanup splits cleanly into three independent follow-up PRs:

- secrets: needs the master-key decryptability check reproduced on
  FilesystemSecretStore before composition can switch off
  LibSqlSecretsStore / PostgresSecretsStore.
- authorization: FilesystemCapabilityLeaseStore needs to move from
  borrowed `&'a ScopedFilesystem<F>` to `Arc<ScopedFilesystem<F>>` so
  it fits `Arc<dyn CapabilityLeaseStore>` for composition wiring.
- run_state: not yet migrated to ScopedFilesystem; the same shape as
  the other migrated consumers is the prerequisite before
  LibSqlRunStateStore / PostgresRunStateStore can be deleted.

Each is independent (200-400 lines + composition rewiring + tests)
and shouldn't pile onto PR #3679. The "trait collapse" further step
(rename Filesystem*Store → *Store, delete the trait) widens
HostRuntimeServices type signatures and is a deeper structural
change deserving its own PR after the SQL impls are gone.
Adds `FilesystemSecretStore::verify_can_decrypt_existing_secrets`,
porting the fail-loud-on-master-key-mismatch contract from
`LibSqlSecretsStore` / `PostgresSecretsStore` so composition can drop
the SQL backends. The sentinel lives at `/secrets/_master_key_check.json`
under the caller's MountView; constant-time compare against the fixed
plaintext mirrors the libSQL / Postgres implementations the SQL
backends carry today.

Driven by the "Legacy per-backend store cleanup — ironclaw_secrets"
entry in docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md.
These per-backend CapabilityLeaseStore impls predate the universal-FS
dispatch design. Now that FilesystemCapabilityLeaseStore routes through
RootFilesystem (via ScopedFilesystem), the backend choice happens at
the RootFilesystem layer (LibSqlRootFilesystem vs PostgresRootFilesystem
vs InMemoryBackend) — no need for an authorization-specific persistence
impl per backend.

Cites the cleanup entry in
docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md
("Legacy per-backend store cleanup" → ironclaw_authorization row).

- Migrate FilesystemCapabilityLeaseStore from `<'a, F>` + `&'a ScopedFilesystem<F>`
  to `<F>` + `Arc<ScopedFilesystem<F>>` so composition can hold it as
  `Arc<dyn CapabilityLeaseStore>` without erasing a lifetime parameter.
  Same shape as FilesystemSecretStore, FilesystemOutboundStateStore,
  FilesystemProcessStore.
- Preserve the AUTH-CRITICAL H4 CAS-Version retry + Unsupported→Any
  fallback in update_lease_cas / write_lease_raw — only the borrow→Arc
  shape changed; serialize/deserialize, mutation_lock, tenant_id
  projection, and revoke/claim/consume semantics are untouched.
- Delete crates/ironclaw_authorization/src/db.rs (LibSqlCapabilityLeaseStore
  + PostgresCapabilityLeaseStore impls + helpers used only by them).
- Delete tests/db_capability_lease_store_contract.rs (the legacy
  contract test exercised the SQL stores directly; durability across
  reopen is now a property of the RootFilesystem backend, not of an
  authorization-specific persistence impl, and the filesystem store
  reload paths are exercised by capability_lease_contract.rs).
- Drop the `libsql` and `postgres` cargo features on
  ironclaw_authorization (and their `dep:libsql` / `dep:deadpool-postgres` /
  `dep:tokio-postgres` / `tracing` dependencies — only the deleted SQL
  code used them).
- Composition (lib.rs + factory.rs): replace
  Arc::new(LibSql/PostgresCapabilityLeaseStore::new(db|pool)) +
  `.run_migrations().await?` with
  Arc::new(FilesystemCapabilityLeaseStore::new(Arc::clone(&scoped_filesystem))).
  Migrations are now the RootFilesystem's responsibility, already run
  on `filesystem.run_migrations()` earlier in the same function.
- Update ironclaw_reborn_composition's `libsql`/`postgres` features to
  drop their `ironclaw_authorization/libsql` and `/postgres` flags.
- Update the host_runtime durable-restart integration test from the
  leaked `&'static ScopedFilesystem<_>` shape to a plain
  `Arc<ScopedFilesystem<LocalFilesystem>>`; the Box::leak is gone.

Verified:
- cargo clippy -p ironclaw_authorization -p ironclaw_capabilities
  -p ironclaw_reborn_composition --all-features --tests -- -D warnings clean
- cargo test -p ironclaw_authorization --all-features: 27 passed
- cargo check --workspace --all-features clean
…sition

The universal-FS dispatch design says backend choice happens at the
RootFilesystem layer, not at the consumer-store layer.
FilesystemSecretStore + FilesystemCredentialBroker already route
through ScopedFilesystem over any RootFilesystem (LibSqlRootFilesystem
/ PostgresRootFilesystem / InMemoryBackend / LocalFilesystem), so the
LibSqlSecretsStore / PostgresSecretsStore / LibSqlCredentialStore /
PostgresCredentialStore siblings are vestigial duplication.

Composition (`crates/ironclaw_reborn_composition/src/{lib,factory}.rs`)
now constructs FilesystemSecretStore over the existing
scoped_filesystem (wrap_scoped). The master-key decryptability check
that used to live on LibSqlSecretsStore was ported to
FilesystemSecretStore::verify_can_decrypt_existing_secrets in
commit 36d230e; composition calls it on every production startup
so a master-key mismatch still fails loud.

Deleted:
- `crates/ironclaw_secrets/src/db.rs` (LibSql + Postgres impls + helpers)
- `crates/ironclaw_secrets/tests/db_secret_store_contract.rs`
- `crates/ironclaw_secrets/tests/db_credential_store_contract.rs`
- `crates/ironclaw_secrets/tests/security_findings_poc.rs` (only exercised the SQL stores)
- `pub use db::{LibSqlSecretsStore, …}` re-exports in `lib.rs`
- `libsql` / `postgres` features on `ironclaw_secrets` (no consumer left)
- `deadpool-postgres` / `libsql` / `tokio-postgres` deps from `ironclaw_secrets/Cargo.toml`
- Feature-union references in `ironclaw_reborn` / `ironclaw_reborn_composition`

Public `SecretStore` and `CredentialBroker` traits are unchanged; the
new `FilesystemSecretStore` already implements them. The
`ScopedSecretsStoreAdapter` shim that previously wrapped the legacy
SQL stores is also unused now and can be removed in a follow-up
(left in place this commit to keep the diff focused on the legacy
deletion).

Per the design-doc entry "Legacy per-backend store cleanup —
ironclaw_secrets" in
docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md.
… Filesystem{RunState,ApprovalRequest}Store

Mirrors the engine migration in commit ac8e677 and the parallel
processes / secrets / authorization / outbound refactors:
`FilesystemRunStateStore` and `FilesystemApprovalRequestStore`
previously held a `&'a F: RootFilesystem` borrow and manually encoded
tenant/user identity in every path via `tenant_user_root` (formatted
`/engine/tenants/<tenant>/users/<user>/agents/.../projects/.../runs|approvals`).
Any composition layer that forgot to wrap the backend in a tenant
scope would leak across tenants with the type system saying nothing —
the HIGH-severity class of finding tracked in PR #3679 for
`ironclaw_engine`.

Migrated the run-state / approval stores so tenant isolation is
structural rather than something this crate has to re-derive from
`ResourceScope.tenant_id` / `user_id`:

- `FilesystemRunStateStore::new` and `FilesystemApprovalRequestStore::new`
  now take `Arc<ScopedFilesystem<F>>`. The `'a` lifetime parameter on
  the struct is gone — composition can hold these as
  `Arc<dyn RunStateStore>` / `Arc<dyn ApprovalRequestStore>` without
  pinning a borrow.
- Path helpers (`run_record_path`, `run_records_root`,
  `approval_record_path`, `approval_records_root`) return `ScopedPath`
  instead of `VirtualPath`. The on-disk layout under the `/run-state`
  and `/approvals` mount aliases is:

      /run-state[/agents/<agent>][/projects/<project>][/missions/<mission>][/threads/<thread>]/runs/<invocation_id>.json
      /approvals[/agents/<agent>][/projects/<project>][/missions/<mission>][/threads/<thread>]/<request_id>.json

  The leading `/engine/tenants/<tenant>/users/<user>` prefix is gone;
  the caller's `MountView` resolves the alias to a tenant/user-scoped
  target at every op. Sub-scope axes (agent/project/mission/thread)
  stay in the alias-relative path because they are within-tenant
  scoping not covered by the per-tenant `MountAlias`.
- `records_for_scope` on both stores reconstructs each child
  `ScopedPath` from the `list_dir`-returned `VirtualPath` leaf via a
  new `join_scoped` helper so the per-op ACL still runs on the
  follow-up `get` (mirrors the engine / processes / secrets / outbound
  store's `join_scoped` shape).
- `put_with_cas` now takes `&ScopedFilesystem<F>` / `&ScopedPath`. The
  `Unsupported`→`Any` fallback semantics are unchanged, so byte-only
  backends (`LocalFilesystem`) keep working through the per-path
  `FILESYSTEM_RECORD_LOCKS` map.
- `filesystem_record_lock` now keys on `ScopedPath` instead of
  `VirtualPath` so the lock is alias-relative (one process-local lock
  per logical record, not per resolved target).
- `tenant_user_root` is removed; its agent/project/mission/thread
  segments move into the new alias-relative `scope_owner_alias_string`
  helper.

Two sibling mount aliases (`/run-state` + `/approvals`) on one
`Arc<ScopedFilesystem<F>>`: composition wires both aliases on the same
shared `MountView`, so this crate can drive run-state and approvals
through one filesystem handle while keeping the on-disk subtrees
distinct.

Tests:

- `tests/run_state_contract.rs` now constructs a `ScopedFilesystem`
  over `LocalFilesystem` with `MountPermissions::read_write_list_delete()`
  grants on both `/run-state` and `/approvals` aliases pointing at
  tenant1/user1 targets. The existing cross-tenant assertions still
  pass because the post-read `same_scope_owner` check filters out
  records whose in-body scope differs from the request scope.
- `tests/approval_resolution_contract.rs` follows the same shape with
  a local `scoped_run_state_fs` helper.
- New regression test
  `filesystem_run_state_store_isolates_two_tenants_with_same_user_project_ids`
  wires two `FilesystemRunStateStore`s + `FilesystemApprovalRequestStore`s
  over one `LocalFilesystem` with different `MountView` targets but
  identical `user_id` / `project_id` / `invocation_id` request scopes.
  Writing on tenant A must not be visible from tenant B's stores —
  `get`, `records_for_scope`, `complete`, and `approve` all fail closed.
  Fails closed if the ScopedFilesystem wrapping ever regresses to raw
  `&F: RootFilesystem`.

Composition + contract changes (so the per-invocation `MountView` and
boundary tests stay aligned):

- `crates/ironclaw_host_api/src/path.rs::VIRTUAL_ROOTS` adds
  `/run-state` and `/approvals` (with the other migrated consumer-store
  roots).
- `crates/ironclaw_reborn_composition/src/lib.rs::PER_USER_ALIASES`
  adds `/run-state` and `/approvals` so both
  `default_singleton_mount_view` and `invocation_mount_view` expose
  them with full per-user-owner permissions.
- `docs/reborn/contracts/storage-placement.md` and
  `docs/reborn/contracts/filesystem.md` add matching `/run-state` and
  `/approvals` rows so the
  `reborn_virtual_roots_match_storage_placement_contract` boundary
  test stays green.
- `crates/ironclaw_host_runtime/tests/reborn_durable_restart_integration.rs`
  is rewired to construct an `Arc<ScopedFilesystem<LocalFilesystem>>`
  (with the new aliases) and pass it to both filesystem run-state
  stores. The capability-lease store keeps the legacy
  `&'static ScopedFilesystem<F>` shape until its own migration lands;
  `leaked_scoped_engine_filesystem` is preserved as the borrow source
  for now.

Trait surfaces (`RunStateStore`, `ApprovalRequestStore`,
`RunStateApprovalStore`) are unchanged. The legacy per-backend
LibSql / Postgres run-state + approval stores stay in this PR; their
deletion is the second pass tracked in
`docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.

Plan: docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md
…position

Completes the third leg of the legacy-store cleanup. Now that
FilesystemRunStateStore + FilesystemApprovalRequestStore route through
ScopedFilesystem (commit a96d1f6), the per-backend siblings
(LibSqlRunStateStore, PostgresRunStateStore, LibSqlApprovalRequestStore,
PostgresApprovalRequestStore, LibSqlRunStateApprovalStore,
PostgresRunStateApprovalStore) are vestigial duplication of what
RootFilesystem already abstracts.

Deleted:
- `crates/ironclaw_run_state/src/db.rs` (LibSql + Postgres + shared
  helpers; ~one large module).
- `crates/ironclaw_run_state/tests/db_run_state_store_contract.rs`
  (was the only suite exercising the SQL impls directly; the
  ScopedFilesystem-backed contract suite over InMemoryBackend +
  RootFilesystem covers the same surface).
- `pub use db::{LibSql*, Postgres*}` re-exports in `lib.rs`.
- `with_libsql_run_state_approval_store` /
  `with_postgres_run_state_approval_store` builder methods on
  `HostRuntimeServices` (composition now wires
  FilesystemRunStateStore + FilesystemApprovalRequestStore directly
  off the shared `Arc<ScopedFilesystem<F>>` it already constructs).

Composition (`crates/ironclaw_reborn_composition/src/{lib,factory}.rs`)
and the durable-restart integration test
(`crates/ironclaw_host_runtime/tests/reborn_durable_restart_integration.rs`)
now use the Arc-based construction shape for all three filesystem-backed
stores in the trio (run-state, approval-requests, capability-leases) —
same `Arc<ScopedFilesystem<LocalFilesystem>>` shared between them.

Per the design-doc entry "Legacy per-backend store cleanup —
ironclaw_run_state" in
`docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.
The test was deleted on origin/reborn-integration in commit 34886c3
(`refactor(crates): group workspace crates by domain`) but my recent
merge (7efef49) kept the file: independently-modified-on-one-side
+ deleted-on-the-other resolves to "keep" by default. The kept file
references `RebornLibSqlMemoryDocumentRepository` which no longer
exists (renamed to `LibSqlMemoryDocumentRepository`), breaking compile.

Delete the file to match upstream's intent — the contract is covered
elsewhere by `ironclaw_memory::tests::db_memory_repository_contract`
and the audit-projection tests inside `ironclaw_event_projections`
that don't depend on the renamed type.
The reborn-native libSQL/Postgres memory document repositories
(`RebornLibSqlMemoryDocumentRepository`,
`RebornPostgresMemoryDocumentRepository`) are vestigial duplication
of what `RootFilesystem` already abstracts — per the same universal-FS
dispatch design that drove the secrets/auth/run_state cleanup.
`FilesystemMemoryDocumentRepository` over `LibSqlRootFilesystem` /
`PostgresRootFilesystem` covers both surfaces with one impl.

Implements `MemoryDocumentIndexRepository` (chunk replace + lifecycle
hooks) on `FilesystemMemoryDocumentRepository` so the existing
`ChunkingMemoryDocumentIndexer` can drive memory indexing through the
unified repo — previously only the reborn-native libSQL impl
participated.

Deleted:
- `crates/ironclaw_memory/src/repo/native_libsql.rs`
- `crates/ironclaw_memory/src/repo/native_postgres.rs`
- All five `reborn_native_*` contract tests in
  `crates/ironclaw_memory/tests/` — covered by the
  `FilesystemMemoryDocumentRepository` contract suite over
  `InMemoryBackend` plus the `RootFilesystem` backend contract tests
  in `ironclaw_filesystem` itself.

`ChunkingMemoryDocumentIndexer` (`indexer.rs`) is unchanged; it
delegates to anything that implements `MemoryDocumentIndexRepository`
and now has a wider universe of impls to choose from.

The dangling `RebornLibSqlMemoryDocumentRepository` reference in
`crates/ironclaw_event_projections/tests/memory_significant_events_projection_contract.rs`
is updated to use `FilesystemMemoryDocumentRepository` over
`InMemoryBackend` — same contract, no DB dependency.

CLAUDE.md updated to drop the "legacy `LibSqlMemoryDocumentRepository`
/ `PostgresMemoryDocumentRepository` still own their own per-backend
behavioral coverage" sentence, since those tests are now deleted.

Per the design-doc entry "Legacy per-backend store cleanup" in
`docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.
@ilblackdragon ilblackdragon requested a review from serrrfirat May 17, 2026 07:56
The universal-FS dispatch design says backend choice happens at the
RootFilesystem layer, not the consumer-store layer. The reborn-native
duplicates were already deleted in 27485c5; this commit removes the
remaining *legacy* per-backend memory document repositories. They
duplicated what FilesystemMemoryDocumentRepository<F> already covers
over RootFilesystem (LibSqlRootFilesystem / PostgresRootFilesystem /
InMemoryBackend / LocalFilesystem).

No production callers — composition has been on
FilesystemMemoryDocumentRepository for a while, and no other crate
imported LibSqlMemoryDocumentRepository / PostgresMemoryDocumentRepository.

Deleted:
- `crates/ironclaw_memory/src/repo/libsql.rs`
- `crates/ironclaw_memory/src/repo/postgres.rs`
- `pub use repo::{LibSql,Postgres}MemoryDocumentRepository` re-exports
- `mod libsql` / `mod postgres` declarations in `repo/mod.rs`
- Feature-gated helpers `scoped_memory_agent_id`,
  `db_path_for_memory_document`, `memory_document_from_db_path` in
  `repo/mod.rs` — only consumed by the deleted impls
- `#[cfg(feature = "libsql")]` `encode_embedding_blob` /
  `decode_embedding_blob` / `cosine_similarity` in `src/embedding.rs`
  and `escape_fts5_query` in `src/search.rs` — same, only consumed
  by the deleted libsql repo (the filesystem repo has its own local
  `encode_embedding_blob`)
- `crates/ironclaw_memory/tests/db_memory_repository_contract.rs`
- `crates/ironclaw_memory/tests/e2e_hybrid_search.rs`
- `crates/ironclaw_memory/tests/e2e_persistence_versioning.rs`
- `crates/ironclaw_memory/tests/e2e_scope_isolation_safety.rs`
- `crates/ironclaw_memory/tests/memory_filesystem_vertical_integration.rs`
- `libsql` / `postgres` features on `ironclaw_memory` (no consumer
  in the workspace enabled them) and the corresponding optional
  deadpool-postgres / libsql / pgvector / tokio-postgres deps

Memory substrate semantics (versioning, chunk replace, metadata
cascade, hybrid search fusion, protected-path safety) retain coverage
through `FilesystemMemoryDocumentRepository` over `InMemoryBackend`
in `memory_backend_contract.rs` + `memory_filesystem_contract.rs` +
the inline `repo/filesystem.rs` `#[cfg(test)]` suite. Backend-specific
durability across reopen is now a `RootFilesystem` property and lives
in `ironclaw_filesystem`'s own backend contract tests.

CLAUDE.md drops the dangling "legacy `LibSqlMemoryDocumentRepository`
/ `PostgresMemoryDocumentRepository` still own their own per-backend
behavioral coverage until their migration is scheduled" tail — the
first half of that sentence was already removed in 27485c5; this
commit removes the second half, since the tests no longer exist.

The `tests/e2e_trace_memory_isolation.rs` ignore-message reference
to the deleted `e2e_scope_isolation_safety.rs` test is repointed at
the InMemoryBackend contract tests that own the substrate-level
SOUL.md rejection coverage now.

The CI `Run ironclaw_memory tests` step drops `--features libsql`
(no such feature exists any more).

Per the design-doc entry "Legacy per-backend store cleanup —
ironclaw_memory" in
`docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.
@github-actions github-actions Bot added the scope: ci CI/CD workflows label May 17, 2026
…oute through filesystem

The universal-FS dispatch design says backend choice happens at the
RootFilesystem layer, not at the durable-log impl layer.
FilesystemDurableEventLog / FilesystemDurableAuditLog already route
through ScopedFilesystem over any RootFilesystem (LibSqlRootFilesystem
/ PostgresRootFilesystem / InMemoryBackend), so the LibSql*/Postgres*
DurableEventLog / DurableAuditLog siblings that spoke SQL directly are
vestigial duplication.

The `RebornEventStoreConfig::Libsql { path_or_url, auth_token }` and
`::Postgres { url }` variants stay (production composition still
matches on the same shape) — but the dispatch now opens a backend-
specific RootFilesystem from the same input and wraps it in the
unified filesystem-backed durable log. Production policy gates
(production InMemory rejection, libSQL cleartext / ambiguous-bare URL
rejection, Postgres sslmode=disable rejection, TLS connector
construction) move into the event-store crate's `libsql_backed` /
`postgres_backed` modules.

Deleted:
- `crates/ironclaw_reborn_event_store/src/libsql_store.rs`
  (LibSqlDurableEventLog, LibSqlDurableAuditLog, LibSqlStore,
  build_libsql_event_stores)
- `crates/ironclaw_reborn_event_store/src/postgres_store.rs`
  (PostgresDurableEventLog, PostgresDurableAuditLog, PostgresStore,
  build_postgres_event_stores)
- `crates/ironclaw_reborn_event_store/src/sql_common.rs` (helpers used
  only by the deleted SQL impls)
- `crates/ironclaw_reborn_event_store/migrations/{libsql,postgres}/*`
  (replaced by RootFilesystem-layer migrations)
- SQL-table corruption fixture tests in `durable_event_store_contract`
  that punched holes in `reborn_event_entries` — the table no longer
  exists for events; the contiguity contract is enforced by the
  RootFilesystem's `append`/`tail` invariants and covered by the
  `FilesystemDurableEventLog` contract suite (backend-agnostic) plus
  the surviving public-surface rebuild tests
- `dep:url` and `dep:uuid` (only used by the deleted Postgres SQL
  body); kept `tokio-postgres-rustls` / `rustls` / etc. since the
  Postgres path still parses the URL and builds the TLS connector
  before handing the pool to `PostgresRootFilesystem`

No composition changes: `with_reborn_event_store_config` still takes
`RebornEventStoreConfig` and matches on the same variants.

Per the design-doc entry "Legacy per-backend store cleanup —
ironclaw_reborn_event_store" in
docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md.
…ilesystemSessionThreadService

Mirrors the run-state / processes / secrets / authorization migrations
(commits a96d1f6, 81664dd, 4ae5676, 5e7688d) plus the analogous
engine migration in ac8e677. The new
`FilesystemSessionThreadService<F: RootFilesystem>` accepts an
`Arc<ScopedFilesystem<F>>` and routes every read/write through the
`/threads` mount alias. The composition layer's per-invocation MountView
resolves `/threads` to a tenant/user-scoped VirtualPath subtree, so
tenant isolation is structural rather than something this crate has to
re-derive from `ThreadScope.tenant_id`.

Path layout under the alias keeps sub-scope (`agent_id`, `project_id`,
`owner_user_id`, `mission_id`) plus the thread id in the alias-relative
path so a single tenant/user can own multiple cells:

    /threads[/agents/<agent>][/projects/<project>][/owners/<owner_user>][/missions/<mission>]/threads/<thread_id>/thread.json
    /threads.../threads/<thread_id>/messages/<message_id>.json
    /threads.../threads/<thread_id>/summaries/<summary_id>.json
    /threads/idempotency/<sha256>.json

The thread record holds the per-thread monotonic `next_sequence`
counter; `accept_inbound_message` and `append_assistant_draft` do an
optimistic CAS on the thread record to reserve a sequence, then write
the new per-message record with `CasExpectation::Absent`. Every
multi-step transition retries on `FilesystemError::VersionMismatch` and
falls back to `CasExpectation::Any` on `Unsupported` so byte-only
`LocalFilesystem` backends keep working (matches the existing
run-state / authorization store fallback shape).

Idempotency records flatten under one `/threads/idempotency/` directory
keyed by SHA-256 of the (scope, source_binding_id, external_event_id)
tuple — two scopes hash to different keys, so the flat layout is safe.
`replay_accepted_inbound_message`, which has no scope input, scans that
directory and matches binding/event id against the persisted record
body.

Wiring:

- `/threads` added to `PER_USER_ALIASES` in `ironclaw_reborn_composition`
  so both `default_singleton_mount_view` and `invocation_mount_view`
  expose the alias with full r/w/l/d permissions.
- `/threads` added to `VIRTUAL_ROOTS` in `ironclaw_host_api::path` so the
  alias resolves to a recognized top-level subtree.
- `docs/reborn/contracts/{filesystem,storage-placement}.md` updated to
  list `/threads` alongside the other consumer-store mount aliases.

Cross-tenant isolation regression test
(`filesystem_session_thread_service_isolates_two_tenants_with_same_user_project_ids`)
asserts that two `FilesystemSessionThreadService`s sharing one
`InMemoryBackend` cannot see each other's threads, messages, or
idempotency records when their `MountView`s resolve `/threads` to
different tenant subtrees — identical `(agent_id, project_id,
owner_user_id, thread_id)` tuples on both sides.

Per the design-doc entry "ironclaw_threads" in
`docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.
…sition

Completes the fourth leg of the legacy per-backend store cleanup
(after `ironclaw_outbound`, `ironclaw_secrets`, `ironclaw_authorization`,
and `ironclaw_run_state`). Now that `FilesystemSessionThreadService`
routes through `ScopedFilesystem` (previous commit), the per-backend
siblings (`LibSqlSessionThreadService`, `PostgresSessionThreadService`)
are vestigial duplication of what `RootFilesystem` already abstracts.

Deleted:
- `crates/ironclaw_threads/src/db.rs` (LibSql + Postgres impls plus the
  shared `DurableState` snapshot/replace helpers — one ~1.8k line module).
- `crates/ironclaw_threads/tests/db_session_thread_contract.rs` (was the
  only suite exercising the SQL impls directly; the
  `FilesystemSessionThreadService` contract suite over `InMemoryBackend`
  + the in-memory `session_thread_contract.rs` semantic suite cover the
  same surface, and durability across reopen is now a `RootFilesystem`
  property tested at the `ironclaw_filesystem` layer).
- `pub use db::{LibSql*,Postgres*}` re-exports in `lib.rs`.
- `libsql` + `postgres` Cargo features on `ironclaw_threads` (no remaining
  feature-gated code paths). `deadpool-postgres`, `tokio-postgres`,
  `libsql`, `tracing`, and `tempfile` (dev) dependencies removed
  alongside.

Composition / call-site wiring:

- `crates/ironclaw_reborn/tests/loop_driver_host.rs`: the
  `turn_runner_worker_completes_after_libsql_turn_and_thread_services_reopen`
  restart regression now builds a `LibSqlRootFilesystem` + a
  `ScopedFilesystem` wrapping `/threads → /threads` and constructs the
  thread service via `FilesystemSessionThreadService::new`. The
  `libsql-restart-tests` feature now pulls
  `ironclaw_filesystem/libsql` instead of `ironclaw_threads/libsql`
  (which no longer exists). `ironclaw_turns/libsql` is still wired
  because `LibSqlTurnStateStore` has not migrated yet.

Per the design-doc entry "Legacy per-backend store cleanup —
ironclaw_threads" in
`docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.
@serrrfirat
Copy link
Copy Markdown
Collaborator

Correctness/security review on current head 440be242cfb1d3953de4541ee91bcd23622f8530: I don't think this is ready to merge yet.

  1. High security: production wiring collapses tenant/user namespaces for the new filesystem-backed stores. wrap_scoped still builds the singleton mount view that maps /secrets, /authorization, /run-state, /approvals, etc. to top-level paths with no tenant/user prefix (crates/ironclaw_reborn_composition/src/lib.rs:126-156, :207-215), and the production builders/factory wire that scoped filesystem into the long-lived secret, lease, and run-state stores (crates/ironclaw_reborn_composition/src/factory.rs:246-280, :313-345). The replacement stores explicitly omit tenant/user from their own paths because they expect the mount view to supply it: e.g. secret_owner_alias only appends agent/project (crates/ironclaw_secrets/src/filesystem_store.rs:1139-1152), lease paths omit tenant/user (crates/ironclaw_authorization/src/lib.rs:1483-1508), and run/approval paths do the same (crates/ironclaw_run_state/src/lib.rs:1003-1036). wrap_scoped_for_invocation exists, but it is not what the production builders use. In a multi-tenant process, same-shape records from different tenants/users now share the same backing path; at minimum this allows cross-tenant overwrite/DoS of secrets and state, and read_lease_versioned also returns the deserialized lease without a stored-scope check (crates/ironclaw_authorization/src/lib.rs:461-477). Either construct these stores from the per-invocation mount view everywhere they handle scoped data, or keep tenant/user in the store keys and add a factory-level two-tenant regression test through the actual production wiring.

  2. High correctness: version history is written before the document CAS, so losing writers leave phantom versions. write_document_with_options archives the previous body via save_document_version before the CasExpectation::Version/Absent document write (crates/ironclaw_memory/src/repo/filesystem.rs:520-556). The append path does the same before its CAS write and then maps VersionMismatch to MemoryAppendOutcome::Conflict (:591-627). If another writer wins between the initial read and the final put, the loser returns conflict but has already inserted a .versions/<n> record for an overwrite/append that never happened. That corrupts version history and can also block the real writer's later archive because save_document_version uses CasExpectation::Absent for the computed next version (:317-352). The archive needs to be part of the same transaction/CAS success path, or be written only after proving the document update succeeded with a collision-safe version id.

  3. High correctness: stale reindexers can delete the winner's fresh chunks before returning stale. replace_document_chunks_if_current verifies the hash, immediately sweeps the .chunks subtree, then re-checks the document hash (crates/ironclaw_memory/src/repo/filesystem.rs:837-870). If a writer updates the document and a fresh indexer writes chunks after the first stale read but before the stale worker's sweep, the stale worker deletes those fresh chunks and then returns SkippedStaleContentHash. The second hash check prevents stale chunks from being written, but it is too late to prevent data loss. Re-check currentness before deleting, or use a per-document generation/transaction so delete+replace is conditional on the same observed document version.

  4. High correctness: DB-backed memory full-text search no longer works because the repository never declares the FTS index it queries. FilesystemMemoryDocumentRepository::search_documents sends Filter::Fts { key: content } directly to the root filesystem (crates/ironclaw_memory/src/repo/filesystem.rs:773-796), but there is no ensure_index call in that repository. The libSQL backend only translates FTS when it discovers a cataloged FTS table (crates/ironclaw_filesystem/src/libsql.rs:1068-1138) and otherwise returns FilesystemError::Unsupported (:1444-1450). The filesystem DB contract tests explicitly call ensure_index before FTS queries, while this PR deletes the native libSQL/Postgres memory repos and DB memory contract tests and removes the memory crate's SQL features (crates/ironclaw_memory/Cargo.toml:13-40; .github/workflows/test.yml:206-209). Please either ensure the content FTS index under the memory scope before search/write startup, or provide a bounded fallback, and restore SQL-backed memory repository coverage via LibSqlRootFilesystem/PostgresRootFilesystem.

  5. High correctness: internal memory sidecar paths collide with legal user document paths. MemoryDocumentPath validation allows relative paths ending in .meta and segments ending in .chunks/.versions (crates/ironclaw_memory/src/path.rs:299-328). The repository stores a document at the literal virtual path (crates/ironclaw_memory/src/path.rs:145-150), but metadata for foo is also stored at foo.meta (crates/ironclaw_memory/src/repo/filesystem.rs:127-142, :675-700). So a legal user document foo.meta is the same backend path as metadata for foo; writing metadata for foo overwrites that document with JSON bytes. list_documents then filters these names out anyway (crates/ironclaw_memory/src/repo/filesystem.rs:721-765, :895-899), contradicting the module doc that claims literal user paths sharing sidecar names are never treated as internal (:22-25). Reserve/reject those names, or move sidecars into an escaped internal namespace that no user document can address.

  6. Medium security/capability correctness: HsmBackend exposes list despite declaring and documenting no list capability. The module says the HSM surface is only Read/Write/Stat/Delete (crates/ironclaw_filesystem/src/hsm.rs:16-22), and declared_capabilities omits Capability::List (:53-60), but list_dir delegates to the inner backend (:94-95). Mount-time capability validation explicitly ignores legacy axes including list (crates/ironclaw_filesystem/src/catalog.rs:150-170), and common read/read-write mount permissions include list (crates/ironclaw_host_api/src/mount.rs:33-66). A mount that appears not to support enumeration can still enumerate names. Either return Unsupported from HsmBackend::list_dir, or advertise/validate list consistently.

Focused checks I ran on this head:

  • cargo test -p ironclaw_memory --lib --all-features
  • cargo test -p ironclaw_filesystem hsm_ --all-features -- --nocapture
  • cargo test -p ironclaw_processes filesystem_process_store_persists_under_resource_scope_engine_processes --test process_store_contract -- --exact --nocapture
  • cargo test -p ironclaw_reborn_composition invocation_mount_view --all-features -- --nocapture

These passing checks don't cover the failing scenarios above.

…m in FilesystemConversationStateStore

Mirrors the engine / processes / secrets / outbound / authorization /
run-state migrations: the legacy `RebornLibSqlConversationStateStore` /
`RebornPostgresConversationStateStore` held the substrate handle
directly (`Arc<libsql::Database>` / `deadpool_postgres::Pool`) and the
caller had to construct per-tenant substrates to keep two tenants'
conversation state apart. The new `FilesystemConversationStateStore<F>`
takes `Arc<ScopedFilesystem<F>>`; the `MountView` resolves the
`/conversations` alias to a tenant/user-scoped `VirtualPath` before any
backend dispatch, so tenant isolation is structural rather than a
convention the caller must remember.

Pass 1 of the two-pass migration described in
`docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`. The
legacy `RebornLibSql*` / `RebornPostgres*` stores stay in this PR;
their deletion is the second pass (mirrors the `run-state` paired
pattern in commits `a96d1f63b` + `a238050a2`).

What changes:

- New `FilesystemConversationStateStore<F: RootFilesystem>` implementing
  the existing `ConversationStateRepository` trait. Persists at
  `/conversations/state.json`. Uses `CasExpectation::Version` + bounded
  retry with `Unsupported` -> `Any` fallback (same `put_with_byte_fallback`
  shape as the other migrated stores). Writes carry a `tenant_ids`
  indexed projection — defense-in-depth scoping that mirrors the other
  migrated consumer crates; the conversation state blob is multi-tenant
  in shape, so the projection lists every tenant present in the
  snapshot (under per-invocation MountView, the rewritten path slices
  to one tenant per file).
- New `RebornFilesystemConversationServices` wrapper that wires an
  `InMemoryConversationServices` over the filesystem store — same
  shape as `RebornLibSqlConversationServices` / `RebornPostgresConversationServices`.
- `state_store` module is now unconditional (was feature-gated to
  `libsql`/`postgres`). The `state_repository` plumbing on
  `InMemoryConversationServices` and `InMemoryState::persistence_revision`
  are unconditional too so the filesystem store can plug in without a
  new feature flag.
- JSON wire envelope `StoredConversationState` flattens HashMaps keyed
  by struct values (`ActorKey`, `BindingKey`, `ThreadKey`,
  `ExternalEventRouteKey`, `MessageIdempotencyKey`,
  `AcceptedMessageReplayKey`, `AcceptedMessageRef`) to `Vec<(K, V)>`
  because JSON requires HashMap keys to be strings. Legacy libSQL/
  Postgres adapters did the same shape via individual rows; this
  envelope keeps the equivalent contract within a single document.

Composition + contracts:

- `PER_USER_ALIASES` in `ironclaw_reborn_composition` gains
  `/conversations`; both `default_singleton_mount_view` and
  `invocation_mount_view` rewrite it.
- `VIRTUAL_ROOTS` in `ironclaw_host_api::path` gains `/conversations`.
- `docs/reborn/contracts/storage-placement.md` and
  `docs/reborn/contracts/filesystem.md` add matching `/conversations`
  rows so the `reborn_virtual_roots_match_storage_placement_contract`
  boundary test stays green.

Tests:

- `tests/filesystem_store_contract.rs` adds two regressions:
  - `filesystem_conversation_services_round_trip_persisted_state_on_reopen`:
    writes pair + binding under one services instance, drops, reopens a
    fresh services instance over the same backend, replays the same
    resolve and asserts the binding rehydrated from durable storage.
  - `filesystem_conversation_state_store_isolates_two_tenants_with_same_user_project_ids`:
    two `RebornFilesystemConversationServices` instances share one
    `InMemoryBackend` with different `MountView` targets but identical
    `(user_id, project_id)` request scopes; writing on tenant A must
    not be visible from tenant B, even though the alias-relative path
    is identical. Fails closed if the `ScopedFilesystem` wrapping ever
    regresses to raw `Arc<F: RootFilesystem>`.
- All 44 pre-existing `inbound_contract` tests keep passing
  unchanged.

`cargo clippy --workspace --all-features --tests -- -D warnings` and
`cargo test -p ironclaw_conversations --all-features` both clean (44
inbound + 2 filesystem-store contract tests).

Plan: docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md
Completes the second leg of the conversations migration. Now that
FilesystemConversationStateStore routes through ScopedFilesystem
(prev commit a46edd88c), the per-backend siblings
(RebornLibSqlConversationStateStore, RebornPostgresConversationStateStore,
RebornLibSqlConversationServices, RebornPostgresConversationServices)
are vestigial duplication of what RootFilesystem already abstracts.

Deleted:

- `crates/ironclaw_conversations/src/libsql.rs` (787 lines).
- `crates/ironclaw_conversations/src/postgres.rs` (785 lines).
- `pub use libsql::{...}` / `pub use postgres::{...}` re-exports in
  `lib.rs` and the matching `mod` declarations.
- The pub(crate) `ExternalConversationIdentity::conversation_fingerprint`
  helper (gated to libsql/postgres; only used by the deleted SQL store
  body). The public `ExternalConversationRef::conversation_fingerprint`
  on the ref type stays — adapter crates still depend on it for
  composite key digesting.
- `libsql`, `postgres`, `dep:libsql`, `dep:deadpool-postgres`,
  `dep:tokio-postgres` features + manifest deps from Cargo.toml.
- The transitively-only-used `sha2` crate dependency (the digest
  helper that backed `conversation_fingerprint` for the SQL composite
  key indexes is also gone).
- Three legacy-store tests in `tests/inbound_contract.rs`:
  `libsql_conversation_services_survive_restart_for_retry_replay`,
  `libsql_stale_service_write_fails_without_overwriting_newer_rows`,
  `postgres_conversation_services_round_trip_restart_replay_when_available`,
  and the `table_count` helper they shared. Durability across reopen
  is now a `RootFilesystem` property covered by
  `filesystem_conversation_services_round_trip_persisted_state_on_reopen`
  in `tests/filesystem_store_contract.rs`.

Composition is untouched because no production composition wired
conversations through these legacy stores yet — the migration plan
listed conversations under "future work" prior to this PR, and the
top-level Reborn composition has no
`with_libsql_conversation_state_store` / equivalent builder. New
composition wires `RebornFilesystemConversationServices` directly off
the shared `Arc<ScopedFilesystem<F>>`.

Per the "Legacy per-backend store cleanup — ironclaw_conversations"
entry in
`docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.

`cargo clippy --workspace --all-features --tests -- -D warnings` and
`cargo test -p ironclaw_conversations --all-features` both clean
(41 inbound + 2 filesystem-store contract tests).
…esystemTurnStateStore

Mirrors the engine migration in commit ac8e677 and the parallel
processes / secrets / authorization / outbound / run-state refactors:
adds `FilesystemTurnStateStore<F>` that holds
`Arc<ScopedFilesystem<F>>` and routes every read/write through a
fixed alias-relative path (`/turns/state.json`). Any composition
layer that hands the store a `ScopedFilesystem` constructed with a
tenant/user-scoped `MountView` gets cross-tenant isolation
structurally — the type system enforces what previously had to be
re-derived from `TurnScope.tenant_id` at every call site.

Internally the store reuses the existing
`InMemoryTurnStateStore::from_persistence_snapshot_with_admission_limit_provider`
load + `persistence_snapshot` write pair, so the in-memory state
machine (idempotency, admission reservations, active locks,
checkpoint sequencing, lifecycle event projection) remains the
canonical implementation. The filesystem store wraps every mutating
trait method in a CAS-aware read-modify-write loop with bounded
retry; reads project through the in-memory store without writing
back. Trait surface implemented:

- `TurnStateStore` (submit / resume / request_cancel / get_run_state)
- `TurnRunTransitionPort` (claim/heartbeat/recover/record_model_route/
  block/complete/cancel/fail/record_recovery_required/
  apply_validated_loop_exit)
- `TurnEventProjectionSource` (read_turn_events_after)
- `LoopCheckpointStore` (put/get loop checkpoints)

The on-disk layout under the `/turns` mount alias is fixed:

    /turns/state.json

Tenant + user identity moves into the caller's `MountView` per the
per-tenant `MountAlias` rewriting, so neither prefix is encoded in
the path itself. Within-tenant axes (agent/project/thread) stay in
the persisted `TurnScope` on each record because they are not covered
by the per-tenant `MountAlias`.

Implementation details mirroring the run-state store:

- CAS-Version + retry on `VersionMismatch`, `Unsupported -> Any`
  fallback so byte-only `LocalFilesystem` keeps working through the
  per-process `FILESYSTEM_RECORD_LOCKS` map.
- `submit_turn` runs the run-profile resolver once *outside* the CAS
  retry loop and threads a pre-resolved resolver into the apply
  closure — keeps the per-path async lock from spanning the resolver
  future.
- `TurnError::Unavailable { reason }` for filesystem operation
  failures (matches the existing `db_error` mapping).

Composition + contract changes:

- `crates/ironclaw_host_api/src/path.rs::VIRTUAL_ROOTS` adds
  `/turns`.
- `crates/ironclaw_reborn_composition/src/lib.rs::PER_USER_ALIASES`
  adds `/turns` so both `default_singleton_mount_view` and
  `invocation_mount_view` expose it with per-user-owner permissions.
- `docs/reborn/contracts/{filesystem,storage-placement}.md` add the
  matching `/turns` rows so the
  `reborn_virtual_roots_match_storage_placement_contract` boundary
  test stays green.

Tests:

- New regression test `tests/filesystem_turn_state_contract.rs`:
  - `filesystem_turn_state_store_persists_submit_and_reopens` —
    asserts a submitted turn rehydrates after the store is
    re-constructed from the same scoped filesystem (durability
    contract).
  - `filesystem_turn_state_store_hides_records_from_other_tenants_via_mount_view`
    wires two `FilesystemTurnStateStore`s over one
    `LocalFilesystem` with different `MountView` targets but
    identical `(tenant_id, agent_id, project_id, thread_id,
    idempotency_key)`. Writing on tenant A must not be visible from
    tenant B's store — `get_run_state` reports `ScopeNotFound`, and
    tenant B mints its own distinct `run_id` on re-submit. Fails
    closed if the ScopedFilesystem wrapping ever regresses to a raw
    `&F: RootFilesystem`.

Trait surface (`TurnStateStore`, `TurnRunTransitionPort`,
`TurnEventProjectionSource`, `LoopCheckpointStore`) is unchanged.
The legacy per-backend `LibSqlTurnStateStore` / `PostgresTurnStateStore`
stay in this PR; their deletion is the second pass tracked in
`docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.

Plan: docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md
…tion

Completes the fourth leg of the legacy-store cleanup. Now that
FilesystemTurnStateStore routes through ScopedFilesystem (commit
ff88e75ca), the per-backend siblings (LibSqlTurnStateStore,
PostgresTurnStateStore) are vestigial duplication of what
RootFilesystem already abstracts. Their internal load-snapshot /
replace-snapshot machinery and their SQL-table schema/migrations are
no longer the source of truth — the snapshot blob format is, and the
backend choice happens at the RootFilesystem layer.

Deleted:
- `crates/ironclaw_turns/src/db.rs` (LibSql + Postgres + shared
  schema/migration/snapshot-load/replace helpers).
- `crates/ironclaw_turns/tests/db_turn_state_store_contract.rs` (was
  the only suite exercising the SQL impls directly; the
  ScopedFilesystem-backed contract suite added in the previous
  commit covers the same surface).
- `pub use db::{LibSqlTurnStateStore, PostgresTurnStateStore}` and
  the `mod db` declaration in `lib.rs`.
- `with_libsql_turn_state_store` / `with_postgres_turn_state_store`
  builder methods on `HostRuntimeServices`. Composition now wires a
  single `with_filesystem_turn_state_store` off the shared
  `Arc<ScopedFilesystem<F>>` it already constructs.
- The `libsql` and `postgres` features on `ironclaw_turns`
  (along with `libsql`, `tokio-postgres`, `deadpool-postgres` deps).

Wiring updates:

- `crates/ironclaw_host_runtime/Cargo.toml`: drop
  `ironclaw_turns/{libsql,postgres}` feature wires.
- `crates/ironclaw_host_runtime/src/services.rs`: replace the two
  per-backend builders with a single
  `with_filesystem_turn_state_store<F>(Arc<ScopedFilesystem<F>>)`.
- `crates/ironclaw_reborn_composition/{lib,factory}.rs`: switch both
  libSQL and PostgreSQL production wirings to
  `.with_filesystem_turn_state_store(Arc::clone(&scoped_filesystem))`
  off the same shared scoped filesystem already used for run-state.
- `crates/ironclaw_reborn_composition/Cargo.toml`: drop the
  `ironclaw_turns/{libsql,postgres}` feature wires.
- `crates/ironclaw_reborn/Cargo.toml`: replace
  `ironclaw_turns/libsql` in `libsql-restart-tests` with
  `ironclaw_filesystem/libsql` (the libSQL backend lives at the
  filesystem layer now).

Test migrations:

- `crates/ironclaw_host_runtime/tests/host_runtime_services_contract.rs`:
  the three `production_turn_*` tests are rewired around a new
  `libsql_scoped_turns_fs` helper that constructs an
  `Arc<ScopedFilesystem<LibSqlRootFilesystem>>` with the canonical
  `/turns` alias. Production readiness assertions, coordinator
  submit+reopen, and missing-resolver fail-closed coverage are all
  preserved against the filesystem-backed store.
- `crates/ironclaw_reborn/tests/loop_driver_host.rs`: the
  `turn_runner_worker_completes_after_libsql_turn_and_thread_services_reopen`
  restart test is rewired around a new `libsql_filesystem_turn_store`
  helper. Both the pre-restart and post-restart sides now build a
  `FilesystemTurnStateStore<LibSqlRootFilesystem>` instead of
  `LibSqlTurnStateStore`. The thread-side `LibSqlSessionThreadService`
  is untouched (its own migration is independent). Restart durability
  is exercised against the libSQL-backed filesystem snapshot.
- `crates/ironclaw_turns/tests/loop_checkpoint_store_contract.rs`:
  drops the libsql/postgres-specific
  `turn_loop_checkpoints`-table assertions (which were validating
  SQL-schema separation between `turn_checkpoints` and
  `turn_loop_checkpoints`) and replaces them with a
  filesystem-backed roundtrip + snapshot-reopen assertion that
  exercises the same `loop_checkpoints` vs `checkpoints` projection
  invariant against `FilesystemTurnStateStore`. InMemory coverage
  is unchanged.

Per the design-doc entry "Legacy per-backend store cleanup —
ironclaw_turns" in
`docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`
(marked Done in this commit).
…elete legacy LibSql/Postgres stores

Mirrors the run-state, processes, secrets, authorization, and outbound
migrations to the universal-FS-dispatch pattern, completing the legacy
per-backend cleanup in `ironclaw_resources`.

Pass 1 — add `FilesystemResourceGovernorStore<F: RootFilesystem>`:

- New `crates/ironclaw_resources/src/filesystem_store.rs` defines
  `FilesystemResourceGovernorStore<F>`, an `Arc<ScopedFilesystem<F>>`-
  based `ResourceGovernorStore` impl that serializes the whole governor
  snapshot (limits + reservations + usage tallies) as a single record
  at `/resources/snapshot.json` under the new consumer-store mount
  alias. Tenant/user identity moves into the caller's `MountView`, so
  the path itself is alias-relative — same shape as the other consumer
  stores migrated in this rework.
- `ResourceGovernorStore::update` is a sync trait (in-memory + legacy
  SQL impls were sync); the new filesystem impl crosses to async via
  a lazily-spawned current-thread tokio worker thread (same shape as
  the deleted libSQL/Postgres `AsyncStorageWorker`).
- Concurrency: a process-local per-path async `Mutex` map serializes
  in-process read-modify-write cycles; the inner CAS-Version
  precondition surfaces cross-process races as `ResourceError::Storage`
  errors (the trait's `FnOnce` closure shape forbids retry, mirroring
  the libSQL/Postgres `BEGIN IMMEDIATE` / `LOCK TABLE` semantics).
- Byte-only backend fallback: `LocalFilesystem` rejects
  `CasExpectation::Version`/`Absent`, so the store falls back to
  `CasExpectation::Any` under the same per-path async lock with an
  emulated `Absent` precheck — matches the other consumer-store
  migrations.
- Adds `/resources` to `VIRTUAL_ROOTS` in
  `crates/ironclaw_host_api/src/path.rs`, `PER_USER_ALIASES` in
  `crates/ironclaw_reborn_composition/src/lib.rs`, and the
  `/resources` row in
  `docs/reborn/contracts/{filesystem,storage-placement}.md` so the
  per-invocation `MountView` and boundary tests stay aligned.
- Cross-tenant isolation regression test in
  `filesystem_store.rs::tests::isolates_two_tenants_with_same_user_project_ids`
  wires two `FilesystemResourceGovernorStore`s over one
  `InMemoryBackend` with different `MountView` targets but identical
  `user_id`/`project_id`. Writing on tenant A must not be visible from
  tenant B's governor — fails closed if the wrapping ever regresses to
  raw `Arc<F: RootFilesystem>`.

Pass 2 — delete legacy stores and switch composition:

- Deleted `LibSqlResourceGovernorStore`, `PostgresResourceGovernorStore`,
  the `AsyncStorageWorker` plumbing, the SQL schema constant, and the
  worker-cell head-of-line-block regression test from `lib.rs`.
- Replaced `with_libsql_resource_governor` /
  `with_postgres_resource_governor` on `HostRuntimeServices` with a
  single `with_filesystem_resource_governor<FsBackend>(Arc<ScopedFilesystem>)`
  builder method. Backend choice is now a `RootFilesystem` property.
- `with_resource_governor` is no longer feature-gated — filesystem
  resource governor wiring is universal.
- `crates/ironclaw_reborn_composition/src/{lib,factory}.rs` now
  construct `FilesystemResourceGovernorStore::new(Arc::clone(&scoped_filesystem))`
  over the shared `Arc<ScopedFilesystem<F>>` composition already
  builds via `wrap_scoped` — same wiring shape as the other migrated
  consumer stores.
- `LibSqlProductionHostRuntimeServices` and
  `PostgresProductionHostRuntimeServices` type aliases now
  parameterize on `FilesystemResourceGovernorStore<{LibSql,Postgres}RootFilesystem>`.
- `crates/ironclaw_resources/Cargo.toml`: `tokio` becomes a
  non-optional dep with `sync` feature added (always needed for the
  per-path async lock); `ironclaw_filesystem` added; `libsql` /
  `postgres` features drop their `tokio` dep (already non-optional).
- Tests in `crates/ironclaw_resources/tests/resource_governor_contract.rs`
  and `crates/ironclaw_host_runtime/tests/host_runtime_services_contract.rs`
  replace the SQL-specific reload + builder tests with
  filesystem-backed equivalents over `InMemoryBackend +
  ScopedFilesystem`; the on-disk snapshot round-trip via a fresh
  store handle covers the same durability contract the SQL versions
  did.

Tests: `cargo test -p ironclaw_resources --all-features` passes
(4 unit + 38 contract); `cargo test -p ironclaw_host_runtime
--all-features --test host_runtime_services_contract` passes (81);
`cargo clippy --workspace --all-features --tests -- -D warnings`
clean; `cargo check --workspace --all-features` clean.

Plan: docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md
…ixture

`reborn_durable_restart_integration::approval_resume_*` and
`process_result_and_output_*` were panicking with "no backend mount
found for virtual path /processes/..." because `mounted_engine_filesystem`
only mounted `/engine` on the underlying `LocalFilesystem`, while the
`ScopedFilesystem` exposed `/processes`, `/authorization`, `/run-state`,
and `/approvals` aliases to the filesystem-backed stores.

Add a backend mount for each of those virtual roots, pointing at a
sibling subdirectory under `engine_root` so the same on-disk tree is
reopened across the restart fixture's service graphs.
…esources

The boundary suite still forbade `ironclaw_filesystem` from
`ironclaw_threads`, `ironclaw_turns`, and `ironclaw_resources`, but each
of those crates now routes durable storage through `ScopedFilesystem`
under the universal-fs-dispatch rework (matches the same exception
already in place for `ironclaw_secrets`).

Mirror the secrets-rule comment shape so the intentional edge is
documented alongside the rule rather than just removed.
Copy link
Copy Markdown
Collaborator

@serrrfirat serrrfirat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@serrrfirat
Copy link
Copy Markdown
Collaborator

Deep architecture/crate hygiene review:

  1. High: tenant isolation is documented but production wiring still builds singleton stores.

In crates/ironclaw_reborn_composition/src/lib.rs, default_singleton_mount_view() maps every consumer alias directly to top-level roots like /processes, /secrets, /run-state, /threads, etc. The production builders then call wrap_scoped(...) once and reuse that long-lived scoped filesystem for process, secret, resource, lease, run-state, and turn stores. The newer invocation_mount_view(scope) / wrap_scoped_for_invocation(...) path exists, but the production service graph does not use it.

Impact: runtime services remain shared across tenants/users despite the PR's tenant-isolation framing. Fix by making store construction request-scoped, or by carrying ResourceScope into every store operation and enforcing it in paths/index predicates.

  1. High: production run-state/approval wiring knowingly drops an atomic contract.

crates/ironclaw_host_runtime/src/services.rs documents that with_filesystem_run_state(...) no longer carries the deleted SQL store's atomic save_pending_and_block_approval transition and falls back to save_pending followed by block_approval. The production builders in ironclaw_reborn_composition still use that path.

Impact: approval persistence can split across two records, so crash/concurrency windows can leave pending approval saved but run not blocked or related partial states. The production readiness guard also does not model combined atomic run-state/approval store as a required component. Fix with a filesystem-backed RunStateApprovalStore using a transaction/CAS bundle, or keep the legacy combined store until parity exists.

  1. Medium: composition logic is duplicated and already drifting.

crates/ironclaw_reborn_composition/src/factory.rs hand-builds libSQL/Postgres production services separately from the newer substrate builders in src/lib.rs. The two paths construct similar stores, scoped filesystems, event-store wiring, and readiness behavior independently.

Impact: this is how contract fixes like scoped mounts, egress wiring, secret validation, or atomic stores get applied to one entrypoint but not the other. Collapse to one production builder and make factory.rs delegate to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs DB MIGRATION PR adds or modifies PostgreSQL or libSQL migration definitions risk: medium Business logic, config, or moderate-risk modules scope: ci CI/CD workflows scope: db/postgres PostgreSQL backend scope: db Database trait / abstraction scope: dependencies Dependency updates scope: docs Documentation size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants