feat(reborn): apply universal FS dispatch across consumer crates by ilblackdragon · Pull Request #3679 · nearai/ironclaw

ilblackdragon · 2026-05-15T07:56:35Z

Single PR stacking ALL the consumer-crate work that applies #3659 (universal RootFilesystem dispatch fabric) across the codebase. Supersedes the prior stacked PRs (#3666, #3670, #3671, #3672, #3678 — all closed).

What's in here

13 commits, +15,214 / -929 LOC across 61 files. Every commit is independently reviewable.

#	Commit	Scope
1	`feat(processes)`	`FilesystemProcessStore` on the unified put/get surface
2	`feat(outbound)`	`FilesystemOutboundStateStore` on the unified surface
3	`feat(authorization)`	`FilesystemCapabilityLeaseStore` through unified put/get
4	`feat(run-state)`	`FilesystemRunStateStore` + `FilesystemApprovalRequestStore`
5	`refactor(workspace)`	Dissolve `ironclaw_storage` (660 LOC, parallel BlobStore/RecordStore traits removed)
6	`feat(filesystem)`	`HsmBackend` placeholder + scope `database.md` to legacy
7	`feat(reborn)`	Native `append`/`tail` on libsql + postgres (V30 migration); `FilesystemDurableEventLog`
8	`feat(secrets)`	`FilesystemSecretStore` + `FilesystemCredentialBroker`
9	`feat(filesystem,memory)`	`IndexKind::Fts` + `IndexKind::Vector` on libsql + postgres; `FilesystemMemoryDocumentRepository`
10	`feat(db) conversations+jobs`	`FilesystemConversationStore` + `FilesystemJobStore` (~1700 LOC + 23 tests)
11	`feat(db) sandbox+routine+toolfail`	`FilesystemSandboxStore` + `FilesystemRoutineStore` + `FilesystemToolFailureStore` (~2000 LOC + 20 tests)
12	`feat(engine)`	`FilesystemStore` — full 33-method engine `Store` impl on the unified surface (~700 LOC + 24 tests; HybridStore untouched)
13	`feat(db) settings+workspace+rest`	`FilesystemSettingsStore` + `FilesystemUsersStore` + `FilesystemChannelPairingStore` + `FilesystemIdentityStore` + `FilesystemWorkspaceStore` (split across 5 sub-modules; ~4400 LOC + 47 tests)

Coverage on the universal-FS surface

Bytes plane: ✅ every consumer routes through put(Entry::bytes(...))/get
Records plane: ✅ every consumer writes Entry::record(...) with indexed projection
Query plane: ✅ all consumers use Filter::Eq/PrefixOn/And/Or/Fts/VectorNearest
Index plane: ✅ ensure_index declares Exact/Prefix at startup; FTS5 + tsvector + cosine vector advertised by libsql + postgres
Event plane: ✅ libsql + postgres natively serve append/tail; FilesystemDurableEventLog + sandbox job-events use it
CAS: ✅ every record store uses CasExpectation::Version with retry on VersionMismatch
Capability declaration + mount-time validation: ✅ BackendCapabilities bitmask; descriptor over-claims surface FilesystemError::DescriptorOverclaims

What's still pending (intentional follow-ups)

Composite Database trait dissolution: each of the 10 sub-traits now has a Filesystem*Store facade in src/db/filesystem_*.rs, but the composite trait itself in src/db/mod.rs is unchanged — both legacy SQL impls and new FS-backed impls coexist. Final removal of the composite trait is intentionally deferred behind the wiring switch (out of scope here).
Host wiring switch: src/app.rs still constructs HybridStore against the legacy Database impl. Switching to CompositeRootFilesystem-backed wiring is a single-PR follow-up.
EncryptedBackend decorator: encryption stays embedded in SecretsCrypto for the migration window; documented TODO to move to a backend decorator.
Real HSM backend: HsmBackend is a placeholder demonstrating the seam.

Verification

cargo fmt --check clean
cargo check --workspace --all-features clean
cargo clippy --workspace --all-features --tests -- -D clippy::correctness clean
cargo test per migrated crate — all green; engine: 549 tests; secrets: 60; memory: 219 (2 pre-existing e2e_hybrid_search failures unrelated); event_store: 57; architecture: 14; workspace lib db::filesystem 86 tests
Postgres-tier tests gracefully skip when no DB reachable; will run in integration CI

Migration pattern (canonical, used by every commit)

pub struct FilesystemFooStore<F: RootFilesystem> {
    fs: Arc<F>,
}

impl<F: RootFilesystem> FooStore for FilesystemFooStore<F> {
    async fn upsert(&self, foo: &Foo) -> Result<(), FooError> {
        let entry = Entry::record(RecordKind::new("foo")?, &foo.to_json()?)?
            .with_indexed(IndexKey::new("status")?, IndexValue::Text(foo.status_str().into()))
            .with_indexed(IndexKey::new("user_id")?, IndexValue::Text(foo.user_id.clone()));
        self.fs.put(&foo_path(&foo.id), entry, CasExpectation::Version(foo.version)).await?;
        Ok(())
    }
}

Path layout: /<root>/<scope>/<id> for entities, /<root>/<scope>/<id>/<child>/<child_id> for sub-entities. Indexed projections + Filter::Eq + And for queries. CAS retry on VersionMismatch.

Test plan

cargo fmt --check
cargo check --workspace --all-features
cargo clippy --workspace --all-features --tests -- -D clippy::correctness
Per-crate tests for every migrated crate (filesystem, secrets, event_store, memory, processes, authorization, run_state, outbound, engine, architecture)
Workspace lib db::filesystem tests (86 pass)
Full CI run (pending)
Postgres-tier integration CI (gracefully skipped locally)

gemini-code-assist

Code Review

This pull request migrates persistence logic across several crates to a unified RootFilesystem interface, effectively replacing the ironclaw_storage crate. The changes introduce append and tail operations for durable logging and implement Full-Text Search and Vector similarity search for the in-memory, libSQL, and PostgreSQL backends. New filesystem-backed repositories are added for memory documents, outbound state, processes, and secrets. Feedback identifies critical performance and concurrency issues, including O(N) bottlenecks in event log head probing, TOCTOU race conditions in secret and session consumption, and inefficient N+1 query patterns during directory listings that require optimization through indexing or targeted queries.

gemini-code-assist · 2026-05-15T07:59:56Z

+            let head = self
+                .fs
+                .tail(&path, SeqNo::from_backend(0))
+                .await
+                .map_err(map_filesystem_tail_error)?
+                .last()
+                .map(|record| record.seq.get())
+                .unwrap_or(0);


Using tail to probe the stream head is a significant performance risk as it fetches the entire event log into memory. To avoid O(N) performance bottlenecks in append operations for durable logs, maintain a cursor cache or perform bounded reverse scans to read only the necessary tail records.

References

Avoid O(N) performance bottlenecks in append operations for durable logs by maintaining a cursor cache or performing bounded reverse scans to read only the necessary tail records.

gemini-code-assist · 2026-05-15T07:59:56Z

+        let material = SecretMaterial::from(decrypted.expose().to_string());
+
+        lease.status = SecretLeaseStatus::Consumed;
+        self.write_lease(&lease).await?;


The consume operation lacks atomicity across multiple processes, creating a time-of-check-to-time-of-use (TOCTOU) race condition. To prevent this, re-verify the condition (lease status) after acquiring the write lock or use the record version for optimistic concurrency control (CAS) to ensure only one update succeeds.

References

To prevent a time-of-check-to-time-of-use (TOCTOU) race condition when checking a condition under a read lock before performing a write, re-verify the condition after acquiring the write lock but before executing the write.

gemini-code-assist · 2026-05-15T07:59:56Z

+        let entry = Entry::bytes(body).with_content_type(ContentType::json());
+        self.filesystem
+            .put(&path, entry, CasExpectation::Any)
+            .await


consume_session_use suffers from a TOCTOU race condition in multi-process deployments. Concurrent updates will result in lost increments. This operation must use the record version for optimistic concurrency control or re-verify the state after acquiring a write lock to ensure atomicity.

References

To prevent a time-of-check-to-time-of-use (TOCTOU) race condition when checking a condition under a read lock before performing a write, re-verify the condition after acquiring the write lock but before executing the write.

gemini-code-assist · 2026-05-15T07:59:56Z

+        for entry in entries {
+            if !entry.name.ends_with(".json") {
+                continue;
+            }
+            let Some(attempt) = self
+                .get_json::<OutboundDeliveryAttempt>(&entry.path)
+                .await?
+            else {
+                continue;
+            };
+            if scope_matches(&attempt.scope, &scope) {
+                deliveries.push(attempt);
+            }
+        }


This loop performs an N+1 query pattern by calling get for every entry. To optimize list operations that involve scanning multiple directories and files, consider maintaining an index file that directly lists the paths to the relevant items, or use targeted database queries to fetch specific records instead of loading all records and filtering in the application.

References

To optimize list operations that involve scanning multiple directories and files, maintain an index file that directly lists the paths to the relevant items.

Use targeted database queries to fetch specific records instead of loading all records and filtering in the application.

gemini-code-assist · 2026-05-15T07:59:56Z

        for entry in entries {
            if entry.name.ends_with(".json") {
-                let bytes = self.filesystem.as_ref().read_file(&entry.path).await?;
-                let record = deserialize::<ProcessRecord>(&bytes)?;
+                // Reviewer (PR #3666) flagged: a `get` returning `None` after
+                // `list_dir` enumerated the path indicates a race or backend
+                // inconsistency. Returning a partial process list silently
+                // hides this; surface it as `NotFound` so callers see the
+                // same failure shape they got with the legacy `read_file`
+                // path.
+                let Some(versioned) = self.filesystem.as_ref().get(&entry.path).await? else {
+                    return Err(ProcessError::Filesystem(format!(
+                        "process record listed but missing at {}",
+                        entry.path
+                    )));
+                };
+                let record = deserialize::<ProcessRecord>(&versioned.entry.body)?;
                if same_scope_owner(&record.scope, scope) {
                    records.push(record);
                }


The list method implements an N+1 pattern. This should be optimized by maintaining an index file that directly lists the paths to the relevant items, avoiding the overhead of multiple get calls during directory traversal.

References

To optimize list operations that involve scanning multiple directories and files, maintain an index file that directly lists the paths to the relevant items.

gemini-code-assist · 2026-05-15T07:59:56Z

+        let mut leases = Vec::new();
+        for entry in entries {
+            if !entry.name.ends_with(".json") {
+                continue;
+            }
+            let Some(versioned) = self
+                .filesystem
+                .get(&entry.path)
+                .await
+                .map_err(fs_to_secret_store_error)?
+            else {
+                continue;
+            };
+            let mut stored: StoredLease = deserialize_secret(&versioned.entry.body)?;
+            if !same_scope_for_lease(&stored.scope, scope) {
+                continue;
+            }
+            stored.status = Self::effective_status(&stored, now);
+            leases.push(Self::lease_to_public(&stored));
+        }


leases_for_scope performs an N+1 query pattern. To optimize this, use targeted database queries to fetch specific records in a single operation, or maintain an index file to avoid costly directory traversal.

References

Use targeted database queries to fetch specific records instead of loading all records and filtering in the application.

To optimize list operations that involve scanning multiple directories and files, maintain an index file that directly lists the paths to the relevant items.

Three changes that close out the remaining HIGH-severity feedback from the self-review (#1 #2 #3 #4 already addressed in 990c4f7 + e2530ad): **#2 — `parse_state` silent fallback to Pending removed.** `src/db/filesystem_jobs.rs::parse_state` previously mapped unknown status strings to `JobState::Pending`, masking schema drift across a rollout (a new state value appearing in stored rows would silently lose its true value). Now returns `Result<JobState, DatabaseError>` and the single caller propagates with `?`. Matches the wire-stable enums rule in `types.md`. **#6 — `is_engine_unsupported` no longer substring-matches.** `crates/ironclaw_engine/src/store/filesystem.rs`: the typed `FilesystemError::Unsupported` discriminator gets lost when wrapped in `EngineError::Store { reason: String }`, so the old check `reason.contains("Unsupported")` would false-positive on any unrelated store error that mentioned the word. Now `fs_to_engine_error` tags the discriminator with a stable `[fs:unsupported]` sentinel and the check matches that sentinel — discriminator-preserving without changing the public `EngineError` shape. **#7 — `FilesystemChannelPairingStore` no longer drops corrupted records.** `src/db/filesystem_pairing.rs::find_pending_requests`: the old code silently filtered records whose JSON failed to deserialize, hiding data corruption. Now propagates `DatabaseError::Serialization` with the stored path so the operator sees the failure. Also: `// silent-ok:` annotations added to the three engine `Store` sites where read-modify-write on unknown ids is intentionally a no-op (matches HybridStore parity per its CLAUDE.md). Each annotation names the legacy contract being preserved. Verification: `cargo check --workspace --all-features` clean; `cargo test -p ironclaw_engine --all-features` 549/549; `cargo test --lib --all-features db::filesystem` 88/88; `cargo fmt --check` clean.

ilblackdragon · 2026-05-15T09:56:00Z

Self-review round done. Spawned a code-reviewer agent against the integrated branch; it returned 7 HIGH / 8 MEDIUM / 7 LOW findings. The HIGH-severity fixes are now in:

#	Issue	Fix commit
H1	`hybrid_search` scanned every chunk in memory instead of using the native FTS/Vector indexes the libsql + postgres backends advertise	`990c4f73e` — rewrote `FilesystemWorkspaceStore::hybrid_search` to call `Filter::Fts` / `Filter::VectorNearest` with a graceful fallback to the scan path on `FilesystemError::Unsupported`; declared FTS+Vector indexes lazily via `OnceCell`-gated `ensure_chunk_indexes`
H2	`parse_state` mapped unknown status strings silently to `JobState::Pending`, masking schema drift	`9c292f9d0` — returns `Result<JobState, DatabaseError>`; call site propagates with `?`
H3	libsql FTS5 trigger DDL spliced path literals — single quotes allowed by `VirtualPath::new` could inject SQL	`e2530adff` — DDL emission site rejects any char outside `[A-Za-z0-9_/.-]`; regression test asserts `path-LIKE`-injection paths are refused at `ensure_index` time
H4	`update_user_status`/`update_thread_state`/`revoke_lease`/etc. used `CasExpectation::Any` — lost concurrent updates	`e2530adff` — canonical CAS-Version + retry-on-`VersionMismatch` loop applied to filesystem_users, filesystem_jobs (update_job_status, mark_job_stuck), and the engine `FilesystemStore` (`update_thread_state`, `revoke_lease`, `update_mission_status`)
H5	Engine store silently returned `Ok(())` on unknown ids without annotation	`9c292f9d0` — added explicit `// silent-ok:` annotations naming HybridStore-parity at each `let Some(versioned) = ... else { return; }` site
H6	`is_engine_unsupported` substring-matched the error message; would false-positive on unrelated errors mentioning "Unsupported"	`9c292f9d0` — `fs_to_engine_error` now tags the typed `FilesystemError::Unsupported` discriminator with a stable `[fs:unsupported]` sentinel; check matches that sentinel
H7	`FilesystemChannelPairingStore::find_pending_requests` silently dropped corrupted records	`9c292f9d0` — propagates `DatabaseError::Serialization` with the offending stored path

Also (peripheral but related): vector "nearest" in libsql + postgres no longer loads every row's body to rank — first stage selects path, indexed, version, ranks by cosine, then get() materializes only the top-k bodies.

MEDIUM/LOW findings (N-scan operations missing indexes, file-size budget violations, etc.) are tracked as follow-ups; none affect correctness.

Test counts after the fix round (local, --all-features):

ironclaw_filesystem: 101 pass
ironclaw_engine: 549 pass (525 lib + 24 FilesystemStore tests)
db::filesystem (workspace lib): 88 pass
ironclaw_architecture: 14 pass
cargo clippy --workspace --all-features --tests -- -D clippy::correctness: clean
cargo fmt --check: clean

CI now running on the integrated branch (run 25911695237).

serrrfirat · 2026-05-15T15:04:28Z

Codex review found several actionable correctness/security issues in the filesystem-backed stores:

P1: Secret lease consumption needs CAS (crates/ironclaw_secrets/src/filesystem_store.rs)
consume reads an active lease, decrypts the secret, then writes the consumed state with CasExpectation::Any. The process-local lock does not protect shared filesystem backends across processes, so two consumers can both consume a one-time lease. Preserve the lease RecordVersion from the read and commit the consumed transition with CasExpectation::Version (or fail/retry on mismatch).
P1: Credential session use increments need CAS (crates/ironclaw_secrets/src/filesystem_store.rs)
consume_session_use validates stored.uses, increments it, then writes with CasExpectation::Any. Concurrent processes can both pass max_uses checks and overwrite each other. Write against versioned.version and reject/retry on VersionMismatch.
P2: Conversation message queries need pagination (src/db/filesystem_conversations.rs)
list_messages_internal fetches only Page::MAX_LIMIT rows before sorting/paginating in memory. Conversations with more than 1024 messages are silently truncated and has_more can be computed from an incomplete set. Loop through pages until a short page, or push caller pagination into the filesystem query.
P2: Filesystem job queries need pagination (src/db/filesystem_jobs.rs)
The shared run_query helper fetches only one max page. Scopes with more than 1024 jobs/actions/estimation records can miss existing rows, affecting listings, summaries, and updates. Mirror the paged loop used by the engine store.
P2: Settings path encoding is lossy (src/db/filesystem_settings.rs)
encode_segment maps unsafe characters to _, so distinct keys like a/b and a_b collide under the same path and can overwrite each other. Use a reversible/collision-free encoding such as percent-encoding or base64-url.
P2: SQL index names need collision resistance (crates/ironclaw_filesystem/src/db.rs)
sql_index_name truncates long identifiers without adding a hash. Distinct long (prefix, name) specs can produce the same DDL object name, and CREATE ... IF NOT EXISTS can silently reuse the wrong backing index/table/triggers. Include a stable hash suffix before truncation.
P2: In-memory backend should reject writes over implicit directories (crates/ironclaw_filesystem/src/in_memory.rs)
InMemoryBackend::put allows writing /memory/a while /memory/a/b exists, producing a tree shape SQL backends reject via has_child_entry. Add the same descendant check so tests cannot pass against impossible production state.

`list_messages_internal`, `list_conversations_summary`, and `run_query` each called `filesystem.query(.., Page::new(0, Page::MAX_LIMIT))` exactly once and trusted the result was complete. Because `Page::MAX_LIMIT == 1024`, conversations with >1024 messages or scopes with >1024 jobs/actions/estimations silently lost every row past the cap, and the `has_more` flag in `list_conversation_messages_paginated` became meaningless once the dropped tail crossed the page boundary. Codex PR #3679 P2 review flagged the pattern. Extract a shared `query_all_pages` helper in `filesystem_conversations` that loops `query(..., Page::new(offset, MAX_LIMIT))` until a short page comes back, then reuse it from `filesystem_jobs::run_query` and from the inline scan in `update_estimation_actuals`. The helper preserves the existing `NotFound -> Vec::new()` short-circuit and the `fs_err_to_database` error mapping so call sites are otherwise unchanged. Regression tests: - `list_messages_drains_pages_beyond_max_limit` writes `MAX_LIMIT + 5` messages and asserts the full count round-trips through `list_conversation_messages` and that `list_conversation_messages_paginated` reports `has_more` honestly for both partial and exhaustive windows. - `get_job_actions_drains_pages_beyond_max_limit` writes `MAX_LIMIT + 3` actions on one job and asserts the full count comes back in sequence order. - `list_agent_jobs_drains_pages_beyond_max_limit` writes `MAX_LIMIT + 7` jobs and asserts both `list_agent_jobs` and `agent_job_summary` count every row.

Two HIGH-severity findings on PR #3679. Both sites read a versioned entry, validated a one-shot/use-limit condition, then wrote back with `CasExpectation::Any`. The process-local mutex only serializes writers inside one process; multi-process callers sharing the same backend root could both pass the check and overwrite each other. - `FilesystemSecretStore::consume` — two consumers could both observe an Active one-shot lease, both decrypt, and both overwrite the consumed marker. - `FilesystemCredentialBroker::consume_session_use` — two consumers could both pass the max-uses check at `uses=N-1` and overwrite each other's increment, losing a use. Both now use the canonical retry-on-`FilesystemError::VersionMismatch` pattern from `ironclaw_engine::store::filesystem::update_thread_state` (post-`e2530adff`): re-read, re-evaluate the consume/use-limit condition, write with `CasExpectation::Version(versioned.version)`. A shared `CAS_RETRY_ATTEMPTS = 3` constant bounds the loop; exhausting it surfaces a transient backend error rather than papering over pathological hot-spots. Also annotated `leases_for_scope` with a `TODO(perf)` covering the N+1 list+get fan-out — bounded today by the owner-prefix path layout and short lease TTLs; replacing it with `Filter::Eq` over `query` requires the secrets store to declare its first index, which is a follow-up. Regression coverage: two new tests wrap `InMemoryBackend` with a `VersionRacingBackend` that bumps the watched path's version out-of-band on the first versioned `put`, forcing a `VersionMismatch` and exercising the retry loop. They also assert that the retried CAS write actually persisted (the next consume hits LeaseConsumed; the next three increments exhaust the max-uses budget).

Four P2 correctness fixes from the codex/gemini review. **Settings keys use percent-encoding** (`src/db/filesystem_settings.rs`): `encode_segment` previously mapped `/`, space, control chars, and others all to `_`. Keys like `a/b` and `a_b` collided onto the same path and silently overwrote each other. Now percent-encodes every byte outside the unreserved set so distinct inputs map to distinct outputs. **SQL index names get a blake3 suffix on overflow** (`crates/ironclaw_filesystem/src/db.rs`): `sql_index_name` truncated identifiers exceeding 62 chars without disambiguating, so two distinct long `(prefix, name)` specs could collapse onto the same DDL object — `CREATE ... IF NOT EXISTS` would silently reuse the wrong index/trigger. Now appends an 8-char blake3 hash suffix before truncating. Added `blake3 = "1"` to the crate's deps (small + already used by other workspace crates). **InMemoryBackend rejects writes over implicit directories** (`crates/ironclaw_filesystem/src/in_memory.rs`): the SQL backends refuse `put(/a)` when `/a/b` exists (treating `/a` as a directory). The in-memory reference impl silently accepted those writes, letting tests pass against production-impossible state. Mirror the SQL contract. **Event-store head-probe is bounded** (`crates/ironclaw_reborn_event_store/src/filesystem_store.rs`): The replay-gap detection previously called `tail(path, 0)` to read the whole log just to look at its last seq — O(N) on every cold-path call. Now probes `tail(path, after - 1)`: a non-empty result means head == after (consumer is caught up); empty means head < after (foreign-future cursor). Returns at most one record instead of the entire log. Verification: cargo check --workspace --all-features clean; cargo test -p ironclaw_filesystem -p ironclaw_secrets -p ironclaw_reborn_event_store --all-features all pass.

ilblackdragon · 2026-05-15T16:21:43Z

Review-response summary — all HIGH-severity and P2-correctness findings from gemini-code-assist + codex review are now addressed.

Finding	Severity	Status	Commit
Event-store `tail(0)` head probe O(N) on cold path (`crates/ironclaw_reborn_event_store/src/filesystem_store.rs:135`)	HIGH (gemini)	Fixed — bounded `tail(after - 1)` probe; returns ≤1 record instead of the whole log	`0ebc22c83`
`SecretStore::consume` TOCTOU on multi-process (`crates/ironclaw_secrets/src/filesystem_store.rs:412`)	HIGH (gemini) / P1 (codex)	Fixed — `CasExpectation::Version` + 3-attempt retry loop; new `filesystem_secret_store_consume_retries_on_version_mismatch` regression test using a `VersionRacingBackend`	`df8ae8d1c`
`consume_session_use` TOCTOU (`crates/ironclaw_secrets/src/filesystem_store.rs:728`)	HIGH (gemini) / P1 (codex)	Fixed — same CAS-retry pattern; regression test added	`df8ae8d1c`
`list_messages_internal` truncates at `Page::MAX_LIMIT` (`src/db/filesystem_conversations.rs`)	P2 (codex)	Fixed — extracted `query_all_pages` drain helper; regression test writes `MAX_LIMIT + 5` messages	`e02cc8c4e`
`run_query` truncates jobs/actions/estimations at `MAX_LIMIT` (`src/db/filesystem_jobs.rs`)	P2 (codex)	Fixed — same drain helper; regression tests for actions and listings	`e02cc8c4e`
`encode_segment` lossy `_` substitution (`src/db/filesystem_settings.rs`)	P2 (codex)	Fixed — percent-encoding; `encode_segment_is_collision_free` test proves `a/b` and `a_b` no longer collide	`0ebc22c83`
`sql_index_name` truncates without hash (`crates/ironclaw_filesystem/src/db.rs`)	P2 (codex)	Fixed — appends 8-char blake3 suffix before truncating	`0ebc22c83`
`InMemoryBackend::put` allows write over implicit dir (`crates/ironclaw_filesystem/src/in_memory.rs`)	P2 (codex)	Fixed — descendant check matches SQL backends' contract	`0ebc22c83`
`leases_for_scope` N+1 list (`crates/ironclaw_secrets/src/filesystem_store.rs:473`)	MEDIUM (gemini)	Partial — kept the scan; added `TODO(perf)` documenting bound (capped by per-owner path prefix + short lease TTL) and the index path that would replace it	`df8ae8d1c`
`filesystem_outbound::list` N+1 pattern (line 224)	MEDIUM (gemini)	Deferred — same shape applies; the scan is bounded by per-tenant cardinality. Will fold into a follow-up perf pass once the host wires production traffic against the unified store
`filesystem_processes::list` N+1 pattern (line 243)	MEDIUM (gemini)	Deferred — same reasoning as above

Verification after the fix round:

cargo check --workspace --all-features clean
cargo test -p ironclaw_filesystem -p ironclaw_secrets -p ironclaw_reborn_event_store --all-features all pass
cargo test --lib --all-features db::filesystem — 91 pass (was 88, added 3 pagination + collision-free tests)
cargo clippy --workspace --all-features --tests -- -D clippy::correctness clean
cargo fmt --check clean

First consumer migration onto the new RootFilesystem surface. Switches the byte-plane read_file/write_file calls inside ironclaw_processes' filesystem-backed store to the unified put/get ops with Entry::bytes + CasExpectation::Any. The on-disk JSON layout is unchanged, every existing test passes, and downstream crates that construct FilesystemProcessStore (ironclaw_host_runtime + tests) don't need to change. Scope deliberately narrow: opaque-file entries through `put`/`get` without record kinds or non-`Any` CAS, since LocalFilesystem's native `put` only accepts that shape (per the foundation PR #3659). Once LocalFilesystem grows sidecar metadata, this consumer can switch to `Entry::record(process_record_kind, ...)` + `CasExpectation::Absent` without changing the on-disk layout. Touch points: - write_record uses put(Entry::bytes, CAS::Any) - start uses get for the existence probe + transition_lock for the atomicity envelope per the single-instance invariant - update_status / get / records_for_scope read via get and unwrap VersionedEntry.body - records_for_scope returns ProcessError::Filesystem (not silent skip) when get returns None for a path that list_dir just yielded — matches the pre-migration NotFound propagation invariant Test scaffold update: BackendErrorFilesystem now overrides `get` too, so the fault-propagation regression test continues to exercise its intended path. (Reviewer P1/P2 on the original #3666 — recursion + silent-skip — addressed in foundation #3659 directly since LocalFilesystem now ships native `put`/`get`.)

Stacked on the consolidated foundation PR #3659. Adds an OutboundStateStore impl that persists outbound metadata under /engine/outbound/{policies,subscriptions,deliveries} through any RootFilesystem. The existing libSQL/Postgres/in-memory stores stay intact during the migration; a follow-up cleanup PR can delete them once production runs on the unified surface. The new store passes the full contract suite (durable_policy_*, subscription_cursor_*, delivery_status_*, notification_policy_*, full_turn_scope_isolation) against InMemoryBackend in the existing outbound_state_store_contract.rs test file.

…fied put/get Stacked on PR #3670 (outbound). Mirrors the ironclaw_processes migration in PR #3666 / now consolidated into #3659. Switches the filesystem-backed lease store's read_file/write_file calls to the unified get/put ops with Entry::bytes + CasExpectation::Any. The on-disk JSON layout is unchanged, every existing test passes, and the per-owner mutation_lock continues to serialize claim/consume/revoke within a single instance. Touch points: - read_lease, read_lease_index, read_lease_file — now use get and unwrap VersionedEntry.body. - write_lease, write_lease_index — now use put(Entry::bytes, Any). - Imports updated. - CountingFilesystem test scaffold gains put/get overrides that forward to its inner LocalFilesystem, since the trait defaults are now Unsupported after the PR #3659 recursion fix.

Stacked on PR #3671 (authorization). Mirrors processes (#3666) and authorization (#3671) migrations. Switches all read_file/write_file calls in FilesystemRunStateStore and FilesystemApprovalRequestStore to the unified get/put ops with Entry::bytes + CasExpectation::Any. On-disk JSON layout unchanged. Test scaffold updates: ConcurrentMissingReadFilesystem and DisappearingApprovalReadFilesystem gain put/get overrides that forward to their inner LocalFilesystem and apply the same fault injection logic on the unified read path (was: only on the legacy read_file path). Required after the trait defaults moved to Unsupported in PR #3659.

The ironclaw_storage crate predates the unified RootFilesystem surface introduced by PR #3659 (universal FS dispatch). Its `BlobStore`/`RecordStore` traits, `StorageKey`/`StorageVersion`/`PutCondition` types, and `StoredBlob`/`StoredRecord` shapes parallel the new unified put/get /CasExpectation/RecordVersion machinery on `RootFilesystem` — a textbook duplicate-dispatch smell flagged by .claude/rules/architecture.md. Only `ironclaw_outbound` consumed any of the crate, and only 5 small helpers (`encode_json`, `decode_json`, `redacted_backend_error`, `StorageError::Backend`, `ABSENT_SCOPE_COMPONENT`). All other types and the entire `BlobStore`/`RecordStore` surface (660 LOC) were unused — their intended consumers already moved to `RootFilesystem` directly. Inlined the 5 helpers into `crates/ironclaw_outbound/src/db.rs`: - `encode_json`/`decode_json` → direct `serde_json::to_string`/`from_str` - `redacted_backend_error` → local log+collapse to `OutboundError::Backend` (preserves the redaction boundary required by ironclaw_outbound/CLAUDE.md) - `ABSENT_SCOPE_COMPONENT` → local const "" Removed the crate's workspace membership, the outbound dep, the forbidden-edges BoundaryRule, and the crate directory. Also updated the ironclaw_outbound BoundaryRule to permit a normal dependency on `ironclaw_filesystem` — `FilesystemOutboundStateStore` landed in the prior cascade PR and the boundary rule was stale.

…egacy Two changes that close out the demoable parts of the universal-FS-dispatch rework (tasks #18 and the demonstrable portion of #19 from the plan). **HsmBackend placeholder** (`crates/ironclaw_filesystem/src/hsm.rs`). Demonstrates that a new backend is a single-file change: implements the one `RootFilesystem` trait, declares a restricted capability surface (`Read` + `Write` + `Stat` + `Delete` + `TxnCapability::Cas` — no records, no query, no index, no events, no multi-key transactions), and routes `put`/`get`/`delete`/`stat`/`list_dir` through an in-process placeholder. Five tests prove the seam works end-to-end: - `hsm_supports_encrypted_bytes_round_trip` — bytes put/get works. - `hsm_rejects_structured_records` — `put` with `RecordKind::Some` or non-empty `indexed` returns `Unsupported`, so a consumer cannot accidentally route records through encryption-only storage. - `hsm_rejects_query_and_index_ops` — `query`/`ensure_index` return `Unsupported` consistent with the declared capabilities. - `composite_rejects_overclaimed_hsm_descriptor` — mount-time validation (`validate_mount_capabilities`) refuses a descriptor that claims `Query`/`IndexExact` over a backend that doesn't deliver, failing with `FilesystemError::DescriptorOverclaims { missing, .. }`. - `composite_routes_to_hsm_under_secrets_mount` — the acceptance gate: mounting HsmBackend at `/secrets` and routing put/get through the composite works with no consumer-visible changes. Indexed projection is still rejected because the declared capabilities advertise no index/query support. A real HSM implementation replaces the in-memory placeholder with an HSM session handle; the trait surface, capability declarations, and mount-time validation are reusable as-is. The placeholder is *not* a security boundary — it is a seam demonstration. **database.md scoped to legacy directories**. The dual-backend rule file (`.claude/rules/database.md`) is `paths`-scoped to `src/db/**`, `src/history/**`, and `migrations/**` — exactly the legacy surface that predates the universal FS dispatch. Added a "Status & Direction" preamble pointing new persistence work at `ScopedFilesystem` and the `2026-05-14-universal-fs-dispatch.md` plan, with the existing per-crate dual-backend guidance kept (and tagged "legacy") for code still inside those directories.

…ection Decorate every `FilesystemCapabilityLeaseStore::write_lease_raw` and `write_lease_index` `Entry` with a `tenant_id` indexed projection so an admin-tier query can filter explicitly by tenant and a path-rewriting bug surfaces as a query-time mismatch rather than silent cross-tenant leakage. The projection is additive — `ScopedFilesystem` path-prefix scoping remains the primary isolation boundary. Adds: - `index_key_tenant_id()` / `index_name_authorization_tenant()` helpers and `ensure_tenant_id_index` declared on the per-owner lease prefix - Tenant projection on lease and lease-index entries - Byte-only-backend fallback on both writes (strip the projection and downgrade CAS to `Any` when LocalFilesystem returns `Unsupported`), matching the sibling crates' fallback shape - Regression test that issues a lease and asserts a positive `Filter::Eq` query (matching tenant) returns rows and a negative query (different tenant) returns zero rows Refs docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md.

Clarify in the tenant-isolation plan's "What this gives us" section that the `tenant_id` defense-in-depth projection covers processes/secrets/ outbound/authorization but deliberately excludes the engine `Store`. The engine trait is single-tenant-by-construction per Open Question 2 — it never sees `tenant_id` internally and has no scope source to drive the projection at the write site, so it relies on path-prefix scoping alone.

…fs-rework-unified # Conflicts: # crates/ironclaw_reborn/src/planned_driver.rs

…ement contract The architecture boundary test `reborn_virtual_roots_match_storage_placement_contract` (`crates/ironclaw_architecture/tests/reborn_dependency_boundaries.rs`) asserts that `ironclaw_host_api::VIRTUAL_ROOTS` matches the canonical roots in `docs/reborn/contracts/storage-placement.md` and `docs/reborn/contracts/filesystem.md`. The previous commit (`b30c31eae`) added five consumer-store roots to `VIRTUAL_ROOTS` without updating the contract docs and collapsed three `/system/*` subroots into a single `/system` entry, which broke the test in CI. - Restore `/system/settings`, `/system/extensions`, `/system/skills` in `VIRTUAL_ROOTS` (they are still load-bearing — used by `ironclaw_dispatcher` for extension discovery and mount roots). - Keep the five new consumer-store roots (`/processes`, `/authorization`, `/outbound`, `/tenant-shared`, `/tenants`) and add matching rows to both contract docs explaining what each one carries. - Split the unified `/system` mount alias in `default_singleton_mount_view` / `invocation_mount_view` into one alias per canonical subroot, since each subroot is the smallest VirtualPath unit reserved by the contract. Tests updated to match. After this commit, `cargo test -p ironclaw_architecture --test reborn_dependency_boundaries` passes and the composition mount-view tests still pass.

These per-backend OutboundStateStore impls predate the universal-FS dispatch design. Now that FilesystemOutboundStateStore routes through RootFilesystem, the backend choice happens at the RootFilesystem layer (LibSqlRootFilesystem vs PostgresRootFilesystem vs InMemoryBackend) — no need for outbound-specific persistence impls. - Delete src/libsql_store.rs (LibSqlOutboundStateStore) - Delete src/postgres_store.rs (PostgresOutboundStateStore) - Delete src/db.rs (helpers used only by the above) - Delete the four contract tests that exercised them (libsql_persists*, libsql_rejects*, postgres_persists*, postgres_rejects*); durability across reopen is now a property of the RootFilesystem backend, not of outbound-specific persistence - Inline-gate the OutboundPushKind/OutboundDeliveryStatus as_str helpers as #[allow(dead_code)] — they were only consumed by the deleted SQL bodies No production callers; no composition rewiring needed (composition already uses FilesystemOutboundStateStore).

… follow-ups After deleting LibSql/PostgresOutboundStateStore (`d4bf7b3c2`), the remaining cleanup splits cleanly into three independent follow-up PRs: - secrets: needs the master-key decryptability check reproduced on FilesystemSecretStore before composition can switch off LibSqlSecretsStore / PostgresSecretsStore. - authorization: FilesystemCapabilityLeaseStore needs to move from borrowed `&'a ScopedFilesystem<F>` to `Arc<ScopedFilesystem<F>>` so it fits `Arc<dyn CapabilityLeaseStore>` for composition wiring. - run_state: not yet migrated to ScopedFilesystem; the same shape as the other migrated consumers is the prerequisite before LibSqlRunStateStore / PostgresRunStateStore can be deleted. Each is independent (200-400 lines + composition rewiring + tests) and shouldn't pile onto PR #3679. The "trait collapse" further step (rename Filesystem*Store → *Store, delete the trait) widens HostRuntimeServices type signatures and is a deeper structural change deserving its own PR after the SQL impls are gone.

Adds `FilesystemSecretStore::verify_can_decrypt_existing_secrets`, porting the fail-loud-on-master-key-mismatch contract from `LibSqlSecretsStore` / `PostgresSecretsStore` so composition can drop the SQL backends. The sentinel lives at `/secrets/_master_key_check.json` under the caller's MountView; constant-time compare against the fixed plaintext mirrors the libSQL / Postgres implementations the SQL backends carry today. Driven by the "Legacy per-backend store cleanup — ironclaw_secrets" entry in docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md.

These per-backend CapabilityLeaseStore impls predate the universal-FS dispatch design. Now that FilesystemCapabilityLeaseStore routes through RootFilesystem (via ScopedFilesystem), the backend choice happens at the RootFilesystem layer (LibSqlRootFilesystem vs PostgresRootFilesystem vs InMemoryBackend) — no need for an authorization-specific persistence impl per backend. Cites the cleanup entry in docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md ("Legacy per-backend store cleanup" → ironclaw_authorization row). - Migrate FilesystemCapabilityLeaseStore from `<'a, F>` + `&'a ScopedFilesystem<F>` to `<F>` + `Arc<ScopedFilesystem<F>>` so composition can hold it as `Arc<dyn CapabilityLeaseStore>` without erasing a lifetime parameter. Same shape as FilesystemSecretStore, FilesystemOutboundStateStore, FilesystemProcessStore. - Preserve the AUTH-CRITICAL H4 CAS-Version retry + Unsupported→Any fallback in update_lease_cas / write_lease_raw — only the borrow→Arc shape changed; serialize/deserialize, mutation_lock, tenant_id projection, and revoke/claim/consume semantics are untouched. - Delete crates/ironclaw_authorization/src/db.rs (LibSqlCapabilityLeaseStore + PostgresCapabilityLeaseStore impls + helpers used only by them). - Delete tests/db_capability_lease_store_contract.rs (the legacy contract test exercised the SQL stores directly; durability across reopen is now a property of the RootFilesystem backend, not of an authorization-specific persistence impl, and the filesystem store reload paths are exercised by capability_lease_contract.rs). - Drop the `libsql` and `postgres` cargo features on ironclaw_authorization (and their `dep:libsql` / `dep:deadpool-postgres` / `dep:tokio-postgres` / `tracing` dependencies — only the deleted SQL code used them). - Composition (lib.rs + factory.rs): replace Arc::new(LibSql/PostgresCapabilityLeaseStore::new(db|pool)) + `.run_migrations().await?` with Arc::new(FilesystemCapabilityLeaseStore::new(Arc::clone(&scoped_filesystem))). Migrations are now the RootFilesystem's responsibility, already run on `filesystem.run_migrations()` earlier in the same function. - Update ironclaw_reborn_composition's `libsql`/`postgres` features to drop their `ironclaw_authorization/libsql` and `/postgres` flags. - Update the host_runtime durable-restart integration test from the leaked `&'static ScopedFilesystem<_>` shape to a plain `Arc<ScopedFilesystem<LocalFilesystem>>`; the Box::leak is gone. Verified: - cargo clippy -p ironclaw_authorization -p ironclaw_capabilities -p ironclaw_reborn_composition --all-features --tests -- -D warnings clean - cargo test -p ironclaw_authorization --all-features: 27 passed - cargo check --workspace --all-features clean

…sition The universal-FS dispatch design says backend choice happens at the RootFilesystem layer, not at the consumer-store layer. FilesystemSecretStore + FilesystemCredentialBroker already route through ScopedFilesystem over any RootFilesystem (LibSqlRootFilesystem / PostgresRootFilesystem / InMemoryBackend / LocalFilesystem), so the LibSqlSecretsStore / PostgresSecretsStore / LibSqlCredentialStore / PostgresCredentialStore siblings are vestigial duplication. Composition (`crates/ironclaw_reborn_composition/src/{lib,factory}.rs`) now constructs FilesystemSecretStore over the existing scoped_filesystem (wrap_scoped). The master-key decryptability check that used to live on LibSqlSecretsStore was ported to FilesystemSecretStore::verify_can_decrypt_existing_secrets in commit 36d230e; composition calls it on every production startup so a master-key mismatch still fails loud. Deleted: - `crates/ironclaw_secrets/src/db.rs` (LibSql + Postgres impls + helpers) - `crates/ironclaw_secrets/tests/db_secret_store_contract.rs` - `crates/ironclaw_secrets/tests/db_credential_store_contract.rs` - `crates/ironclaw_secrets/tests/security_findings_poc.rs` (only exercised the SQL stores) - `pub use db::{LibSqlSecretsStore, …}` re-exports in `lib.rs` - `libsql` / `postgres` features on `ironclaw_secrets` (no consumer left) - `deadpool-postgres` / `libsql` / `tokio-postgres` deps from `ironclaw_secrets/Cargo.toml` - Feature-union references in `ironclaw_reborn` / `ironclaw_reborn_composition` Public `SecretStore` and `CredentialBroker` traits are unchanged; the new `FilesystemSecretStore` already implements them. The `ScopedSecretsStoreAdapter` shim that previously wrapped the legacy SQL stores is also unused now and can be removed in a follow-up (left in place this commit to keep the diff focused on the legacy deletion). Per the design-doc entry "Legacy per-backend store cleanup — ironclaw_secrets" in docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md.

… Filesystem{RunState,ApprovalRequest}Store Mirrors the engine migration in commit ac8e677 and the parallel processes / secrets / authorization / outbound refactors: `FilesystemRunStateStore` and `FilesystemApprovalRequestStore` previously held a `&'a F: RootFilesystem` borrow and manually encoded tenant/user identity in every path via `tenant_user_root` (formatted `/engine/tenants/<tenant>/users/<user>/agents/.../projects/.../runs|approvals`). Any composition layer that forgot to wrap the backend in a tenant scope would leak across tenants with the type system saying nothing — the HIGH-severity class of finding tracked in PR #3679 for `ironclaw_engine`. Migrated the run-state / approval stores so tenant isolation is structural rather than something this crate has to re-derive from `ResourceScope.tenant_id` / `user_id`: - `FilesystemRunStateStore::new` and `FilesystemApprovalRequestStore::new` now take `Arc<ScopedFilesystem<F>>`. The `'a` lifetime parameter on the struct is gone — composition can hold these as `Arc<dyn RunStateStore>` / `Arc<dyn ApprovalRequestStore>` without pinning a borrow. - Path helpers (`run_record_path`, `run_records_root`, `approval_record_path`, `approval_records_root`) return `ScopedPath` instead of `VirtualPath`. The on-disk layout under the `/run-state` and `/approvals` mount aliases is: /run-state[/agents/<agent>][/projects/<project>][/missions/<mission>][/threads/<thread>]/runs/<invocation_id>.json /approvals[/agents/<agent>][/projects/<project>][/missions/<mission>][/threads/<thread>]/<request_id>.json The leading `/engine/tenants/<tenant>/users/<user>` prefix is gone; the caller's `MountView` resolves the alias to a tenant/user-scoped target at every op. Sub-scope axes (agent/project/mission/thread) stay in the alias-relative path because they are within-tenant scoping not covered by the per-tenant `MountAlias`. - `records_for_scope` on both stores reconstructs each child `ScopedPath` from the `list_dir`-returned `VirtualPath` leaf via a new `join_scoped` helper so the per-op ACL still runs on the follow-up `get` (mirrors the engine / processes / secrets / outbound store's `join_scoped` shape). - `put_with_cas` now takes `&ScopedFilesystem<F>` / `&ScopedPath`. The `Unsupported`→`Any` fallback semantics are unchanged, so byte-only backends (`LocalFilesystem`) keep working through the per-path `FILESYSTEM_RECORD_LOCKS` map. - `filesystem_record_lock` now keys on `ScopedPath` instead of `VirtualPath` so the lock is alias-relative (one process-local lock per logical record, not per resolved target). - `tenant_user_root` is removed; its agent/project/mission/thread segments move into the new alias-relative `scope_owner_alias_string` helper. Two sibling mount aliases (`/run-state` + `/approvals`) on one `Arc<ScopedFilesystem<F>>`: composition wires both aliases on the same shared `MountView`, so this crate can drive run-state and approvals through one filesystem handle while keeping the on-disk subtrees distinct. Tests: - `tests/run_state_contract.rs` now constructs a `ScopedFilesystem` over `LocalFilesystem` with `MountPermissions::read_write_list_delete()` grants on both `/run-state` and `/approvals` aliases pointing at tenant1/user1 targets. The existing cross-tenant assertions still pass because the post-read `same_scope_owner` check filters out records whose in-body scope differs from the request scope. - `tests/approval_resolution_contract.rs` follows the same shape with a local `scoped_run_state_fs` helper. - New regression test `filesystem_run_state_store_isolates_two_tenants_with_same_user_project_ids` wires two `FilesystemRunStateStore`s + `FilesystemApprovalRequestStore`s over one `LocalFilesystem` with different `MountView` targets but identical `user_id` / `project_id` / `invocation_id` request scopes. Writing on tenant A must not be visible from tenant B's stores — `get`, `records_for_scope`, `complete`, and `approve` all fail closed. Fails closed if the ScopedFilesystem wrapping ever regresses to raw `&F: RootFilesystem`. Composition + contract changes (so the per-invocation `MountView` and boundary tests stay aligned): - `crates/ironclaw_host_api/src/path.rs::VIRTUAL_ROOTS` adds `/run-state` and `/approvals` (with the other migrated consumer-store roots). - `crates/ironclaw_reborn_composition/src/lib.rs::PER_USER_ALIASES` adds `/run-state` and `/approvals` so both `default_singleton_mount_view` and `invocation_mount_view` expose them with full per-user-owner permissions. - `docs/reborn/contracts/storage-placement.md` and `docs/reborn/contracts/filesystem.md` add matching `/run-state` and `/approvals` rows so the `reborn_virtual_roots_match_storage_placement_contract` boundary test stays green. - `crates/ironclaw_host_runtime/tests/reborn_durable_restart_integration.rs` is rewired to construct an `Arc<ScopedFilesystem<LocalFilesystem>>` (with the new aliases) and pass it to both filesystem run-state stores. The capability-lease store keeps the legacy `&'static ScopedFilesystem<F>` shape until its own migration lands; `leaked_scoped_engine_filesystem` is preserved as the borrow source for now. Trait surfaces (`RunStateStore`, `ApprovalRequestStore`, `RunStateApprovalStore`) are unchanged. The legacy per-backend LibSql / Postgres run-state + approval stores stay in this PR; their deletion is the second pass tracked in `docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`. Plan: docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md

…position Completes the third leg of the legacy-store cleanup. Now that FilesystemRunStateStore + FilesystemApprovalRequestStore route through ScopedFilesystem (commit a96d1f6), the per-backend siblings (LibSqlRunStateStore, PostgresRunStateStore, LibSqlApprovalRequestStore, PostgresApprovalRequestStore, LibSqlRunStateApprovalStore, PostgresRunStateApprovalStore) are vestigial duplication of what RootFilesystem already abstracts. Deleted: - `crates/ironclaw_run_state/src/db.rs` (LibSql + Postgres + shared helpers; ~one large module). - `crates/ironclaw_run_state/tests/db_run_state_store_contract.rs` (was the only suite exercising the SQL impls directly; the ScopedFilesystem-backed contract suite over InMemoryBackend + RootFilesystem covers the same surface). - `pub use db::{LibSql*, Postgres*}` re-exports in `lib.rs`. - `with_libsql_run_state_approval_store` / `with_postgres_run_state_approval_store` builder methods on `HostRuntimeServices` (composition now wires FilesystemRunStateStore + FilesystemApprovalRequestStore directly off the shared `Arc<ScopedFilesystem<F>>` it already constructs). Composition (`crates/ironclaw_reborn_composition/src/{lib,factory}.rs`) and the durable-restart integration test (`crates/ironclaw_host_runtime/tests/reborn_durable_restart_integration.rs`) now use the Arc-based construction shape for all three filesystem-backed stores in the trio (run-state, approval-requests, capability-leases) — same `Arc<ScopedFilesystem<LocalFilesystem>>` shared between them. Per the design-doc entry "Legacy per-backend store cleanup — ironclaw_run_state" in `docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.

The test was deleted on origin/reborn-integration in commit 34886c3 (`refactor(crates): group workspace crates by domain`) but my recent merge (7efef49) kept the file: independently-modified-on-one-side + deleted-on-the-other resolves to "keep" by default. The kept file references `RebornLibSqlMemoryDocumentRepository` which no longer exists (renamed to `LibSqlMemoryDocumentRepository`), breaking compile. Delete the file to match upstream's intent — the contract is covered elsewhere by `ironclaw_memory::tests::db_memory_repository_contract` and the audit-projection tests inside `ironclaw_event_projections` that don't depend on the renamed type.

The reborn-native libSQL/Postgres memory document repositories (`RebornLibSqlMemoryDocumentRepository`, `RebornPostgresMemoryDocumentRepository`) are vestigial duplication of what `RootFilesystem` already abstracts — per the same universal-FS dispatch design that drove the secrets/auth/run_state cleanup. `FilesystemMemoryDocumentRepository` over `LibSqlRootFilesystem` / `PostgresRootFilesystem` covers both surfaces with one impl. Implements `MemoryDocumentIndexRepository` (chunk replace + lifecycle hooks) on `FilesystemMemoryDocumentRepository` so the existing `ChunkingMemoryDocumentIndexer` can drive memory indexing through the unified repo — previously only the reborn-native libSQL impl participated. Deleted: - `crates/ironclaw_memory/src/repo/native_libsql.rs` - `crates/ironclaw_memory/src/repo/native_postgres.rs` - All five `reborn_native_*` contract tests in `crates/ironclaw_memory/tests/` — covered by the `FilesystemMemoryDocumentRepository` contract suite over `InMemoryBackend` plus the `RootFilesystem` backend contract tests in `ironclaw_filesystem` itself. `ChunkingMemoryDocumentIndexer` (`indexer.rs`) is unchanged; it delegates to anything that implements `MemoryDocumentIndexRepository` and now has a wider universe of impls to choose from. The dangling `RebornLibSqlMemoryDocumentRepository` reference in `crates/ironclaw_event_projections/tests/memory_significant_events_projection_contract.rs` is updated to use `FilesystemMemoryDocumentRepository` over `InMemoryBackend` — same contract, no DB dependency. CLAUDE.md updated to drop the "legacy `LibSqlMemoryDocumentRepository` / `PostgresMemoryDocumentRepository` still own their own per-backend behavioral coverage" sentence, since those tests are now deleted. Per the design-doc entry "Legacy per-backend store cleanup" in `docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.

The universal-FS dispatch design says backend choice happens at the RootFilesystem layer, not the consumer-store layer. The reborn-native duplicates were already deleted in 27485c5; this commit removes the remaining *legacy* per-backend memory document repositories. They duplicated what FilesystemMemoryDocumentRepository<F> already covers over RootFilesystem (LibSqlRootFilesystem / PostgresRootFilesystem / InMemoryBackend / LocalFilesystem). No production callers — composition has been on FilesystemMemoryDocumentRepository for a while, and no other crate imported LibSqlMemoryDocumentRepository / PostgresMemoryDocumentRepository. Deleted: - `crates/ironclaw_memory/src/repo/libsql.rs` - `crates/ironclaw_memory/src/repo/postgres.rs` - `pub use repo::{LibSql,Postgres}MemoryDocumentRepository` re-exports - `mod libsql` / `mod postgres` declarations in `repo/mod.rs` - Feature-gated helpers `scoped_memory_agent_id`, `db_path_for_memory_document`, `memory_document_from_db_path` in `repo/mod.rs` — only consumed by the deleted impls - `#[cfg(feature = "libsql")]` `encode_embedding_blob` / `decode_embedding_blob` / `cosine_similarity` in `src/embedding.rs` and `escape_fts5_query` in `src/search.rs` — same, only consumed by the deleted libsql repo (the filesystem repo has its own local `encode_embedding_blob`) - `crates/ironclaw_memory/tests/db_memory_repository_contract.rs` - `crates/ironclaw_memory/tests/e2e_hybrid_search.rs` - `crates/ironclaw_memory/tests/e2e_persistence_versioning.rs` - `crates/ironclaw_memory/tests/e2e_scope_isolation_safety.rs` - `crates/ironclaw_memory/tests/memory_filesystem_vertical_integration.rs` - `libsql` / `postgres` features on `ironclaw_memory` (no consumer in the workspace enabled them) and the corresponding optional deadpool-postgres / libsql / pgvector / tokio-postgres deps Memory substrate semantics (versioning, chunk replace, metadata cascade, hybrid search fusion, protected-path safety) retain coverage through `FilesystemMemoryDocumentRepository` over `InMemoryBackend` in `memory_backend_contract.rs` + `memory_filesystem_contract.rs` + the inline `repo/filesystem.rs` `#[cfg(test)]` suite. Backend-specific durability across reopen is now a `RootFilesystem` property and lives in `ironclaw_filesystem`'s own backend contract tests. CLAUDE.md drops the dangling "legacy `LibSqlMemoryDocumentRepository` / `PostgresMemoryDocumentRepository` still own their own per-backend behavioral coverage until their migration is scheduled" tail — the first half of that sentence was already removed in 27485c5; this commit removes the second half, since the tests no longer exist. The `tests/e2e_trace_memory_isolation.rs` ignore-message reference to the deleted `e2e_scope_isolation_safety.rs` test is repointed at the InMemoryBackend contract tests that own the substrate-level SOUL.md rejection coverage now. The CI `Run ironclaw_memory tests` step drops `--features libsql` (no such feature exists any more). Per the design-doc entry "Legacy per-backend store cleanup — ironclaw_memory" in `docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.

…oute through filesystem The universal-FS dispatch design says backend choice happens at the RootFilesystem layer, not at the durable-log impl layer. FilesystemDurableEventLog / FilesystemDurableAuditLog already route through ScopedFilesystem over any RootFilesystem (LibSqlRootFilesystem / PostgresRootFilesystem / InMemoryBackend), so the LibSql*/Postgres* DurableEventLog / DurableAuditLog siblings that spoke SQL directly are vestigial duplication. The `RebornEventStoreConfig::Libsql { path_or_url, auth_token }` and `::Postgres { url }` variants stay (production composition still matches on the same shape) — but the dispatch now opens a backend- specific RootFilesystem from the same input and wraps it in the unified filesystem-backed durable log. Production policy gates (production InMemory rejection, libSQL cleartext / ambiguous-bare URL rejection, Postgres sslmode=disable rejection, TLS connector construction) move into the event-store crate's `libsql_backed` / `postgres_backed` modules. Deleted: - `crates/ironclaw_reborn_event_store/src/libsql_store.rs` (LibSqlDurableEventLog, LibSqlDurableAuditLog, LibSqlStore, build_libsql_event_stores) - `crates/ironclaw_reborn_event_store/src/postgres_store.rs` (PostgresDurableEventLog, PostgresDurableAuditLog, PostgresStore, build_postgres_event_stores) - `crates/ironclaw_reborn_event_store/src/sql_common.rs` (helpers used only by the deleted SQL impls) - `crates/ironclaw_reborn_event_store/migrations/{libsql,postgres}/*` (replaced by RootFilesystem-layer migrations) - SQL-table corruption fixture tests in `durable_event_store_contract` that punched holes in `reborn_event_entries` — the table no longer exists for events; the contiguity contract is enforced by the RootFilesystem's `append`/`tail` invariants and covered by the `FilesystemDurableEventLog` contract suite (backend-agnostic) plus the surviving public-surface rebuild tests - `dep:url` and `dep:uuid` (only used by the deleted Postgres SQL body); kept `tokio-postgres-rustls` / `rustls` / etc. since the Postgres path still parses the URL and builds the TLS connector before handing the pool to `PostgresRootFilesystem` No composition changes: `with_reborn_event_store_config` still takes `RebornEventStoreConfig` and matches on the same variants. Per the design-doc entry "Legacy per-backend store cleanup — ironclaw_reborn_event_store" in docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md.

…ilesystemSessionThreadService Mirrors the run-state / processes / secrets / authorization migrations (commits a96d1f6, 81664dd, 4ae5676, 5e7688d) plus the analogous engine migration in ac8e677. The new `FilesystemSessionThreadService<F: RootFilesystem>` accepts an `Arc<ScopedFilesystem<F>>` and routes every read/write through the `/threads` mount alias. The composition layer's per-invocation MountView resolves `/threads` to a tenant/user-scoped VirtualPath subtree, so tenant isolation is structural rather than something this crate has to re-derive from `ThreadScope.tenant_id`. Path layout under the alias keeps sub-scope (`agent_id`, `project_id`, `owner_user_id`, `mission_id`) plus the thread id in the alias-relative path so a single tenant/user can own multiple cells: /threads[/agents/<agent>][/projects/<project>][/owners/<owner_user>][/missions/<mission>]/threads/<thread_id>/thread.json /threads.../threads/<thread_id>/messages/<message_id>.json /threads.../threads/<thread_id>/summaries/<summary_id>.json /threads/idempotency/<sha256>.json The thread record holds the per-thread monotonic `next_sequence` counter; `accept_inbound_message` and `append_assistant_draft` do an optimistic CAS on the thread record to reserve a sequence, then write the new per-message record with `CasExpectation::Absent`. Every multi-step transition retries on `FilesystemError::VersionMismatch` and falls back to `CasExpectation::Any` on `Unsupported` so byte-only `LocalFilesystem` backends keep working (matches the existing run-state / authorization store fallback shape). Idempotency records flatten under one `/threads/idempotency/` directory keyed by SHA-256 of the (scope, source_binding_id, external_event_id) tuple — two scopes hash to different keys, so the flat layout is safe. `replay_accepted_inbound_message`, which has no scope input, scans that directory and matches binding/event id against the persisted record body. Wiring: - `/threads` added to `PER_USER_ALIASES` in `ironclaw_reborn_composition` so both `default_singleton_mount_view` and `invocation_mount_view` expose the alias with full r/w/l/d permissions. - `/threads` added to `VIRTUAL_ROOTS` in `ironclaw_host_api::path` so the alias resolves to a recognized top-level subtree. - `docs/reborn/contracts/{filesystem,storage-placement}.md` updated to list `/threads` alongside the other consumer-store mount aliases. Cross-tenant isolation regression test (`filesystem_session_thread_service_isolates_two_tenants_with_same_user_project_ids`) asserts that two `FilesystemSessionThreadService`s sharing one `InMemoryBackend` cannot see each other's threads, messages, or idempotency records when their `MountView`s resolve `/threads` to different tenant subtrees — identical `(agent_id, project_id, owner_user_id, thread_id)` tuples on both sides. Per the design-doc entry "ironclaw_threads" in `docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.

…sition Completes the fourth leg of the legacy per-backend store cleanup (after `ironclaw_outbound`, `ironclaw_secrets`, `ironclaw_authorization`, and `ironclaw_run_state`). Now that `FilesystemSessionThreadService` routes through `ScopedFilesystem` (previous commit), the per-backend siblings (`LibSqlSessionThreadService`, `PostgresSessionThreadService`) are vestigial duplication of what `RootFilesystem` already abstracts. Deleted: - `crates/ironclaw_threads/src/db.rs` (LibSql + Postgres impls plus the shared `DurableState` snapshot/replace helpers — one ~1.8k line module). - `crates/ironclaw_threads/tests/db_session_thread_contract.rs` (was the only suite exercising the SQL impls directly; the `FilesystemSessionThreadService` contract suite over `InMemoryBackend` + the in-memory `session_thread_contract.rs` semantic suite cover the same surface, and durability across reopen is now a `RootFilesystem` property tested at the `ironclaw_filesystem` layer). - `pub use db::{LibSql*,Postgres*}` re-exports in `lib.rs`. - `libsql` + `postgres` Cargo features on `ironclaw_threads` (no remaining feature-gated code paths). `deadpool-postgres`, `tokio-postgres`, `libsql`, `tracing`, and `tempfile` (dev) dependencies removed alongside. Composition / call-site wiring: - `crates/ironclaw_reborn/tests/loop_driver_host.rs`: the `turn_runner_worker_completes_after_libsql_turn_and_thread_services_reopen` restart regression now builds a `LibSqlRootFilesystem` + a `ScopedFilesystem` wrapping `/threads → /threads` and constructs the thread service via `FilesystemSessionThreadService::new`. The `libsql-restart-tests` feature now pulls `ironclaw_filesystem/libsql` instead of `ironclaw_threads/libsql` (which no longer exists). `ironclaw_turns/libsql` is still wired because `LibSqlTurnStateStore` has not migrated yet. Per the design-doc entry "Legacy per-backend store cleanup — ironclaw_threads" in `docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`.

serrrfirat · 2026-05-17T09:12:32Z

Correctness/security review on current head 440be242cfb1d3953de4541ee91bcd23622f8530: I don't think this is ready to merge yet.

High security: production wiring collapses tenant/user namespaces for the new filesystem-backed stores. wrap_scoped still builds the singleton mount view that maps /secrets, /authorization, /run-state, /approvals, etc. to top-level paths with no tenant/user prefix (crates/ironclaw_reborn_composition/src/lib.rs:126-156, :207-215), and the production builders/factory wire that scoped filesystem into the long-lived secret, lease, and run-state stores (crates/ironclaw_reborn_composition/src/factory.rs:246-280, :313-345). The replacement stores explicitly omit tenant/user from their own paths because they expect the mount view to supply it: e.g. secret_owner_alias only appends agent/project (crates/ironclaw_secrets/src/filesystem_store.rs:1139-1152), lease paths omit tenant/user (crates/ironclaw_authorization/src/lib.rs:1483-1508), and run/approval paths do the same (crates/ironclaw_run_state/src/lib.rs:1003-1036). wrap_scoped_for_invocation exists, but it is not what the production builders use. In a multi-tenant process, same-shape records from different tenants/users now share the same backing path; at minimum this allows cross-tenant overwrite/DoS of secrets and state, and read_lease_versioned also returns the deserialized lease without a stored-scope check (crates/ironclaw_authorization/src/lib.rs:461-477). Either construct these stores from the per-invocation mount view everywhere they handle scoped data, or keep tenant/user in the store keys and add a factory-level two-tenant regression test through the actual production wiring.
High correctness: version history is written before the document CAS, so losing writers leave phantom versions. write_document_with_options archives the previous body via save_document_version before the CasExpectation::Version/Absent document write (crates/ironclaw_memory/src/repo/filesystem.rs:520-556). The append path does the same before its CAS write and then maps VersionMismatch to MemoryAppendOutcome::Conflict (:591-627). If another writer wins between the initial read and the final put, the loser returns conflict but has already inserted a .versions/<n> record for an overwrite/append that never happened. That corrupts version history and can also block the real writer's later archive because save_document_version uses CasExpectation::Absent for the computed next version (:317-352). The archive needs to be part of the same transaction/CAS success path, or be written only after proving the document update succeeded with a collision-safe version id.
High correctness: stale reindexers can delete the winner's fresh chunks before returning stale. replace_document_chunks_if_current verifies the hash, immediately sweeps the .chunks subtree, then re-checks the document hash (crates/ironclaw_memory/src/repo/filesystem.rs:837-870). If a writer updates the document and a fresh indexer writes chunks after the first stale read but before the stale worker's sweep, the stale worker deletes those fresh chunks and then returns SkippedStaleContentHash. The second hash check prevents stale chunks from being written, but it is too late to prevent data loss. Re-check currentness before deleting, or use a per-document generation/transaction so delete+replace is conditional on the same observed document version.
High correctness: DB-backed memory full-text search no longer works because the repository never declares the FTS index it queries. FilesystemMemoryDocumentRepository::search_documents sends Filter::Fts { key: content } directly to the root filesystem (crates/ironclaw_memory/src/repo/filesystem.rs:773-796), but there is no ensure_index call in that repository. The libSQL backend only translates FTS when it discovers a cataloged FTS table (crates/ironclaw_filesystem/src/libsql.rs:1068-1138) and otherwise returns FilesystemError::Unsupported (:1444-1450). The filesystem DB contract tests explicitly call ensure_index before FTS queries, while this PR deletes the native libSQL/Postgres memory repos and DB memory contract tests and removes the memory crate's SQL features (crates/ironclaw_memory/Cargo.toml:13-40; .github/workflows/test.yml:206-209). Please either ensure the content FTS index under the memory scope before search/write startup, or provide a bounded fallback, and restore SQL-backed memory repository coverage via LibSqlRootFilesystem/PostgresRootFilesystem.
High correctness: internal memory sidecar paths collide with legal user document paths. MemoryDocumentPath validation allows relative paths ending in .meta and segments ending in .chunks/.versions (crates/ironclaw_memory/src/path.rs:299-328). The repository stores a document at the literal virtual path (crates/ironclaw_memory/src/path.rs:145-150), but metadata for foo is also stored at foo.meta (crates/ironclaw_memory/src/repo/filesystem.rs:127-142, :675-700). So a legal user document foo.meta is the same backend path as metadata for foo; writing metadata for foo overwrites that document with JSON bytes. list_documents then filters these names out anyway (crates/ironclaw_memory/src/repo/filesystem.rs:721-765, :895-899), contradicting the module doc that claims literal user paths sharing sidecar names are never treated as internal (:22-25). Reserve/reject those names, or move sidecars into an escaped internal namespace that no user document can address.
Medium security/capability correctness: HsmBackend exposes list despite declaring and documenting no list capability. The module says the HSM surface is only Read/Write/Stat/Delete (crates/ironclaw_filesystem/src/hsm.rs:16-22), and declared_capabilities omits Capability::List (:53-60), but list_dir delegates to the inner backend (:94-95). Mount-time capability validation explicitly ignores legacy axes including list (crates/ironclaw_filesystem/src/catalog.rs:150-170), and common read/read-write mount permissions include list (crates/ironclaw_host_api/src/mount.rs:33-66). A mount that appears not to support enumeration can still enumerate names. Either return Unsupported from HsmBackend::list_dir, or advertise/validate list consistently.

Focused checks I ran on this head:

cargo test -p ironclaw_memory --lib --all-features ✅
cargo test -p ironclaw_filesystem hsm_ --all-features -- --nocapture ✅
cargo test -p ironclaw_processes filesystem_process_store_persists_under_resource_scope_engine_processes --test process_store_contract -- --exact --nocapture ✅
cargo test -p ironclaw_reborn_composition invocation_mount_view --all-features -- --nocapture ✅

These passing checks don't cover the failing scenarios above.

…m in FilesystemConversationStateStore Mirrors the engine / processes / secrets / outbound / authorization / run-state migrations: the legacy `RebornLibSqlConversationStateStore` / `RebornPostgresConversationStateStore` held the substrate handle directly (`Arc<libsql::Database>` / `deadpool_postgres::Pool`) and the caller had to construct per-tenant substrates to keep two tenants' conversation state apart. The new `FilesystemConversationStateStore<F>` takes `Arc<ScopedFilesystem<F>>`; the `MountView` resolves the `/conversations` alias to a tenant/user-scoped `VirtualPath` before any backend dispatch, so tenant isolation is structural rather than a convention the caller must remember. Pass 1 of the two-pass migration described in `docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`. The legacy `RebornLibSql*` / `RebornPostgres*` stores stay in this PR; their deletion is the second pass (mirrors the `run-state` paired pattern in commits `a96d1f63b` + `a238050a2`). What changes: - New `FilesystemConversationStateStore<F: RootFilesystem>` implementing the existing `ConversationStateRepository` trait. Persists at `/conversations/state.json`. Uses `CasExpectation::Version` + bounded retry with `Unsupported` -> `Any` fallback (same `put_with_byte_fallback` shape as the other migrated stores). Writes carry a `tenant_ids` indexed projection — defense-in-depth scoping that mirrors the other migrated consumer crates; the conversation state blob is multi-tenant in shape, so the projection lists every tenant present in the snapshot (under per-invocation MountView, the rewritten path slices to one tenant per file). - New `RebornFilesystemConversationServices` wrapper that wires an `InMemoryConversationServices` over the filesystem store — same shape as `RebornLibSqlConversationServices` / `RebornPostgresConversationServices`. - `state_store` module is now unconditional (was feature-gated to `libsql`/`postgres`). The `state_repository` plumbing on `InMemoryConversationServices` and `InMemoryState::persistence_revision` are unconditional too so the filesystem store can plug in without a new feature flag. - JSON wire envelope `StoredConversationState` flattens HashMaps keyed by struct values (`ActorKey`, `BindingKey`, `ThreadKey`, `ExternalEventRouteKey`, `MessageIdempotencyKey`, `AcceptedMessageReplayKey`, `AcceptedMessageRef`) to `Vec<(K, V)>` because JSON requires HashMap keys to be strings. Legacy libSQL/ Postgres adapters did the same shape via individual rows; this envelope keeps the equivalent contract within a single document. Composition + contracts: - `PER_USER_ALIASES` in `ironclaw_reborn_composition` gains `/conversations`; both `default_singleton_mount_view` and `invocation_mount_view` rewrite it. - `VIRTUAL_ROOTS` in `ironclaw_host_api::path` gains `/conversations`. - `docs/reborn/contracts/storage-placement.md` and `docs/reborn/contracts/filesystem.md` add matching `/conversations` rows so the `reborn_virtual_roots_match_storage_placement_contract` boundary test stays green. Tests: - `tests/filesystem_store_contract.rs` adds two regressions: - `filesystem_conversation_services_round_trip_persisted_state_on_reopen`: writes pair + binding under one services instance, drops, reopens a fresh services instance over the same backend, replays the same resolve and asserts the binding rehydrated from durable storage. - `filesystem_conversation_state_store_isolates_two_tenants_with_same_user_project_ids`: two `RebornFilesystemConversationServices` instances share one `InMemoryBackend` with different `MountView` targets but identical `(user_id, project_id)` request scopes; writing on tenant A must not be visible from tenant B, even though the alias-relative path is identical. Fails closed if the `ScopedFilesystem` wrapping ever regresses to raw `Arc<F: RootFilesystem>`. - All 44 pre-existing `inbound_contract` tests keep passing unchanged. `cargo clippy --workspace --all-features --tests -- -D warnings` and `cargo test -p ironclaw_conversations --all-features` both clean (44 inbound + 2 filesystem-store contract tests). Plan: docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md

Completes the second leg of the conversations migration. Now that FilesystemConversationStateStore routes through ScopedFilesystem (prev commit a46edd88c), the per-backend siblings (RebornLibSqlConversationStateStore, RebornPostgresConversationStateStore, RebornLibSqlConversationServices, RebornPostgresConversationServices) are vestigial duplication of what RootFilesystem already abstracts. Deleted: - `crates/ironclaw_conversations/src/libsql.rs` (787 lines). - `crates/ironclaw_conversations/src/postgres.rs` (785 lines). - `pub use libsql::{...}` / `pub use postgres::{...}` re-exports in `lib.rs` and the matching `mod` declarations. - The pub(crate) `ExternalConversationIdentity::conversation_fingerprint` helper (gated to libsql/postgres; only used by the deleted SQL store body). The public `ExternalConversationRef::conversation_fingerprint` on the ref type stays — adapter crates still depend on it for composite key digesting. - `libsql`, `postgres`, `dep:libsql`, `dep:deadpool-postgres`, `dep:tokio-postgres` features + manifest deps from Cargo.toml. - The transitively-only-used `sha2` crate dependency (the digest helper that backed `conversation_fingerprint` for the SQL composite key indexes is also gone). - Three legacy-store tests in `tests/inbound_contract.rs`: `libsql_conversation_services_survive_restart_for_retry_replay`, `libsql_stale_service_write_fails_without_overwriting_newer_rows`, `postgres_conversation_services_round_trip_restart_replay_when_available`, and the `table_count` helper they shared. Durability across reopen is now a `RootFilesystem` property covered by `filesystem_conversation_services_round_trip_persisted_state_on_reopen` in `tests/filesystem_store_contract.rs`. Composition is untouched because no production composition wired conversations through these legacy stores yet — the migration plan listed conversations under "future work" prior to this PR, and the top-level Reborn composition has no `with_libsql_conversation_state_store` / equivalent builder. New composition wires `RebornFilesystemConversationServices` directly off the shared `Arc<ScopedFilesystem<F>>`. Per the "Legacy per-backend store cleanup — ironclaw_conversations" entry in `docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`. `cargo clippy --workspace --all-features --tests -- -D warnings` and `cargo test -p ironclaw_conversations --all-features` both clean (41 inbound + 2 filesystem-store contract tests).

…esystemTurnStateStore Mirrors the engine migration in commit ac8e677 and the parallel processes / secrets / authorization / outbound / run-state refactors: adds `FilesystemTurnStateStore<F>` that holds `Arc<ScopedFilesystem<F>>` and routes every read/write through a fixed alias-relative path (`/turns/state.json`). Any composition layer that hands the store a `ScopedFilesystem` constructed with a tenant/user-scoped `MountView` gets cross-tenant isolation structurally — the type system enforces what previously had to be re-derived from `TurnScope.tenant_id` at every call site. Internally the store reuses the existing `InMemoryTurnStateStore::from_persistence_snapshot_with_admission_limit_provider` load + `persistence_snapshot` write pair, so the in-memory state machine (idempotency, admission reservations, active locks, checkpoint sequencing, lifecycle event projection) remains the canonical implementation. The filesystem store wraps every mutating trait method in a CAS-aware read-modify-write loop with bounded retry; reads project through the in-memory store without writing back. Trait surface implemented: - `TurnStateStore` (submit / resume / request_cancel / get_run_state) - `TurnRunTransitionPort` (claim/heartbeat/recover/record_model_route/ block/complete/cancel/fail/record_recovery_required/ apply_validated_loop_exit) - `TurnEventProjectionSource` (read_turn_events_after) - `LoopCheckpointStore` (put/get loop checkpoints) The on-disk layout under the `/turns` mount alias is fixed: /turns/state.json Tenant + user identity moves into the caller's `MountView` per the per-tenant `MountAlias` rewriting, so neither prefix is encoded in the path itself. Within-tenant axes (agent/project/thread) stay in the persisted `TurnScope` on each record because they are not covered by the per-tenant `MountAlias`. Implementation details mirroring the run-state store: - CAS-Version + retry on `VersionMismatch`, `Unsupported -> Any` fallback so byte-only `LocalFilesystem` keeps working through the per-process `FILESYSTEM_RECORD_LOCKS` map. - `submit_turn` runs the run-profile resolver once *outside* the CAS retry loop and threads a pre-resolved resolver into the apply closure — keeps the per-path async lock from spanning the resolver future. - `TurnError::Unavailable { reason }` for filesystem operation failures (matches the existing `db_error` mapping). Composition + contract changes: - `crates/ironclaw_host_api/src/path.rs::VIRTUAL_ROOTS` adds `/turns`. - `crates/ironclaw_reborn_composition/src/lib.rs::PER_USER_ALIASES` adds `/turns` so both `default_singleton_mount_view` and `invocation_mount_view` expose it with per-user-owner permissions. - `docs/reborn/contracts/{filesystem,storage-placement}.md` add the matching `/turns` rows so the `reborn_virtual_roots_match_storage_placement_contract` boundary test stays green. Tests: - New regression test `tests/filesystem_turn_state_contract.rs`: - `filesystem_turn_state_store_persists_submit_and_reopens` — asserts a submitted turn rehydrates after the store is re-constructed from the same scoped filesystem (durability contract). - `filesystem_turn_state_store_hides_records_from_other_tenants_via_mount_view` wires two `FilesystemTurnStateStore`s over one `LocalFilesystem` with different `MountView` targets but identical `(tenant_id, agent_id, project_id, thread_id, idempotency_key)`. Writing on tenant A must not be visible from tenant B's store — `get_run_state` reports `ScopeNotFound`, and tenant B mints its own distinct `run_id` on re-submit. Fails closed if the ScopedFilesystem wrapping ever regresses to a raw `&F: RootFilesystem`. Trait surface (`TurnStateStore`, `TurnRunTransitionPort`, `TurnEventProjectionSource`, `LoopCheckpointStore`) is unchanged. The legacy per-backend `LibSqlTurnStateStore` / `PostgresTurnStateStore` stay in this PR; their deletion is the second pass tracked in `docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md`. Plan: docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md

…tion Completes the fourth leg of the legacy-store cleanup. Now that FilesystemTurnStateStore routes through ScopedFilesystem (commit ff88e75ca), the per-backend siblings (LibSqlTurnStateStore, PostgresTurnStateStore) are vestigial duplication of what RootFilesystem already abstracts. Their internal load-snapshot / replace-snapshot machinery and their SQL-table schema/migrations are no longer the source of truth — the snapshot blob format is, and the backend choice happens at the RootFilesystem layer. Deleted: - `crates/ironclaw_turns/src/db.rs` (LibSql + Postgres + shared schema/migration/snapshot-load/replace helpers). - `crates/ironclaw_turns/tests/db_turn_state_store_contract.rs` (was the only suite exercising the SQL impls directly; the ScopedFilesystem-backed contract suite added in the previous commit covers the same surface). - `pub use db::{LibSqlTurnStateStore, PostgresTurnStateStore}` and the `mod db` declaration in `lib.rs`. - `with_libsql_turn_state_store` / `with_postgres_turn_state_store` builder methods on `HostRuntimeServices`. Composition now wires a single `with_filesystem_turn_state_store` off the shared `Arc<ScopedFilesystem<F>>` it already constructs. - The `libsql` and `postgres` features on `ironclaw_turns` (along with `libsql`, `tokio-postgres`, `deadpool-postgres` deps). Wiring updates: - `crates/ironclaw_host_runtime/Cargo.toml`: drop `ironclaw_turns/{libsql,postgres}` feature wires. - `crates/ironclaw_host_runtime/src/services.rs`: replace the two per-backend builders with a single `with_filesystem_turn_state_store<F>(Arc<ScopedFilesystem<F>>)`. - `crates/ironclaw_reborn_composition/{lib,factory}.rs`: switch both libSQL and PostgreSQL production wirings to `.with_filesystem_turn_state_store(Arc::clone(&scoped_filesystem))` off the same shared scoped filesystem already used for run-state. - `crates/ironclaw_reborn_composition/Cargo.toml`: drop the `ironclaw_turns/{libsql,postgres}` feature wires. - `crates/ironclaw_reborn/Cargo.toml`: replace `ironclaw_turns/libsql` in `libsql-restart-tests` with `ironclaw_filesystem/libsql` (the libSQL backend lives at the filesystem layer now). Test migrations: - `crates/ironclaw_host_runtime/tests/host_runtime_services_contract.rs`: the three `production_turn_*` tests are rewired around a new `libsql_scoped_turns_fs` helper that constructs an `Arc<ScopedFilesystem<LibSqlRootFilesystem>>` with the canonical `/turns` alias. Production readiness assertions, coordinator submit+reopen, and missing-resolver fail-closed coverage are all preserved against the filesystem-backed store. - `crates/ironclaw_reborn/tests/loop_driver_host.rs`: the `turn_runner_worker_completes_after_libsql_turn_and_thread_services_reopen` restart test is rewired around a new `libsql_filesystem_turn_store` helper. Both the pre-restart and post-restart sides now build a `FilesystemTurnStateStore<LibSqlRootFilesystem>` instead of `LibSqlTurnStateStore`. The thread-side `LibSqlSessionThreadService` is untouched (its own migration is independent). Restart durability is exercised against the libSQL-backed filesystem snapshot. - `crates/ironclaw_turns/tests/loop_checkpoint_store_contract.rs`: drops the libsql/postgres-specific `turn_loop_checkpoints`-table assertions (which were validating SQL-schema separation between `turn_checkpoints` and `turn_loop_checkpoints`) and replaces them with a filesystem-backed roundtrip + snapshot-reopen assertion that exercises the same `loop_checkpoints` vs `checkpoints` projection invariant against `FilesystemTurnStateStore`. InMemory coverage is unchanged. Per the design-doc entry "Legacy per-backend store cleanup — ironclaw_turns" in `docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md` (marked Done in this commit).

…elete legacy LibSql/Postgres stores Mirrors the run-state, processes, secrets, authorization, and outbound migrations to the universal-FS-dispatch pattern, completing the legacy per-backend cleanup in `ironclaw_resources`. Pass 1 — add `FilesystemResourceGovernorStore<F: RootFilesystem>`: - New `crates/ironclaw_resources/src/filesystem_store.rs` defines `FilesystemResourceGovernorStore<F>`, an `Arc<ScopedFilesystem<F>>`- based `ResourceGovernorStore` impl that serializes the whole governor snapshot (limits + reservations + usage tallies) as a single record at `/resources/snapshot.json` under the new consumer-store mount alias. Tenant/user identity moves into the caller's `MountView`, so the path itself is alias-relative — same shape as the other consumer stores migrated in this rework. - `ResourceGovernorStore::update` is a sync trait (in-memory + legacy SQL impls were sync); the new filesystem impl crosses to async via a lazily-spawned current-thread tokio worker thread (same shape as the deleted libSQL/Postgres `AsyncStorageWorker`). - Concurrency: a process-local per-path async `Mutex` map serializes in-process read-modify-write cycles; the inner CAS-Version precondition surfaces cross-process races as `ResourceError::Storage` errors (the trait's `FnOnce` closure shape forbids retry, mirroring the libSQL/Postgres `BEGIN IMMEDIATE` / `LOCK TABLE` semantics). - Byte-only backend fallback: `LocalFilesystem` rejects `CasExpectation::Version`/`Absent`, so the store falls back to `CasExpectation::Any` under the same per-path async lock with an emulated `Absent` precheck — matches the other consumer-store migrations. - Adds `/resources` to `VIRTUAL_ROOTS` in `crates/ironclaw_host_api/src/path.rs`, `PER_USER_ALIASES` in `crates/ironclaw_reborn_composition/src/lib.rs`, and the `/resources` row in `docs/reborn/contracts/{filesystem,storage-placement}.md` so the per-invocation `MountView` and boundary tests stay aligned. - Cross-tenant isolation regression test in `filesystem_store.rs::tests::isolates_two_tenants_with_same_user_project_ids` wires two `FilesystemResourceGovernorStore`s over one `InMemoryBackend` with different `MountView` targets but identical `user_id`/`project_id`. Writing on tenant A must not be visible from tenant B's governor — fails closed if the wrapping ever regresses to raw `Arc<F: RootFilesystem>`. Pass 2 — delete legacy stores and switch composition: - Deleted `LibSqlResourceGovernorStore`, `PostgresResourceGovernorStore`, the `AsyncStorageWorker` plumbing, the SQL schema constant, and the worker-cell head-of-line-block regression test from `lib.rs`. - Replaced `with_libsql_resource_governor` / `with_postgres_resource_governor` on `HostRuntimeServices` with a single `with_filesystem_resource_governor<FsBackend>(Arc<ScopedFilesystem>)` builder method. Backend choice is now a `RootFilesystem` property. - `with_resource_governor` is no longer feature-gated — filesystem resource governor wiring is universal. - `crates/ironclaw_reborn_composition/src/{lib,factory}.rs` now construct `FilesystemResourceGovernorStore::new(Arc::clone(&scoped_filesystem))` over the shared `Arc<ScopedFilesystem<F>>` composition already builds via `wrap_scoped` — same wiring shape as the other migrated consumer stores. - `LibSqlProductionHostRuntimeServices` and `PostgresProductionHostRuntimeServices` type aliases now parameterize on `FilesystemResourceGovernorStore<{LibSql,Postgres}RootFilesystem>`. - `crates/ironclaw_resources/Cargo.toml`: `tokio` becomes a non-optional dep with `sync` feature added (always needed for the per-path async lock); `ironclaw_filesystem` added; `libsql` / `postgres` features drop their `tokio` dep (already non-optional). - Tests in `crates/ironclaw_resources/tests/resource_governor_contract.rs` and `crates/ironclaw_host_runtime/tests/host_runtime_services_contract.rs` replace the SQL-specific reload + builder tests with filesystem-backed equivalents over `InMemoryBackend + ScopedFilesystem`; the on-disk snapshot round-trip via a fresh store handle covers the same durability contract the SQL versions did. Tests: `cargo test -p ironclaw_resources --all-features` passes (4 unit + 38 contract); `cargo test -p ironclaw_host_runtime --all-features --test host_runtime_services_contract` passes (81); `cargo clippy --workspace --all-features --tests -- -D warnings` clean; `cargo check --workspace --all-features` clean. Plan: docs/plans/2026-05-16-scoped-filesystem-tenant-isolation.md

…ixture `reborn_durable_restart_integration::approval_resume_*` and `process_result_and_output_*` were panicking with "no backend mount found for virtual path /processes/..." because `mounted_engine_filesystem` only mounted `/engine` on the underlying `LocalFilesystem`, while the `ScopedFilesystem` exposed `/processes`, `/authorization`, `/run-state`, and `/approvals` aliases to the filesystem-backed stores. Add a backend mount for each of those virtual roots, pointing at a sibling subdirectory under `engine_root` so the same on-disk tree is reopened across the restart fixture's service graphs.

…esources The boundary suite still forbade `ironclaw_filesystem` from `ironclaw_threads`, `ironclaw_turns`, and `ironclaw_resources`, but each of those crates now routes durable storage through `ScopedFilesystem` under the universal-fs-dispatch rework (matches the same exception already in place for `ironclaw_secrets`). Mirror the secrets-rule comment shape so the intentional edge is documented alongside the rule rather than just removed.

serrrfirat

LGTM

serrrfirat · 2026-05-18T09:27:46Z

Deep architecture/crate hygiene review:

High: tenant isolation is documented but production wiring still builds singleton stores.

In crates/ironclaw_reborn_composition/src/lib.rs, default_singleton_mount_view() maps every consumer alias directly to top-level roots like /processes, /secrets, /run-state, /threads, etc. The production builders then call wrap_scoped(...) once and reuse that long-lived scoped filesystem for process, secret, resource, lease, run-state, and turn stores. The newer invocation_mount_view(scope) / wrap_scoped_for_invocation(...) path exists, but the production service graph does not use it.

Impact: runtime services remain shared across tenants/users despite the PR's tenant-isolation framing. Fix by making store construction request-scoped, or by carrying ResourceScope into every store operation and enforcing it in paths/index predicates.

High: production run-state/approval wiring knowingly drops an atomic contract.

crates/ironclaw_host_runtime/src/services.rs documents that with_filesystem_run_state(...) no longer carries the deleted SQL store's atomic save_pending_and_block_approval transition and falls back to save_pending followed by block_approval. The production builders in ironclaw_reborn_composition still use that path.

Impact: approval persistence can split across two records, so crash/concurrency windows can leave pending approval saved but run not blocked or related partial states. The production readiness guard also does not model combined atomic run-state/approval store as a required component. Fix with a filesystem-backed RunStateApprovalStore using a transaction/CAS bundle, or keep the legacy combined store until parity exists.

Medium: composition logic is duplicated and already drifting.

crates/ironclaw_reborn_composition/src/factory.rs hand-builds libSQL/Postgres production services separately from the newer substrate builders in src/lib.rs. The two paths construct similar stores, scoped filesystems, event-store wiring, and readiness behavior independently.

Impact: this is how contract fixes like scoped mounts, egress wiring, secret validation, or atomic stores get applied to one entrypoint but not the other. Collapse to one production builder and make factory.rs delegate to it.

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

ilblackdragon mentioned this pull request May 15, 2026

feat(filesystem): unified storage dispatch fabric — foundation + SQL backends + indexer #3659

Merged

5 tasks

ilblackdragon added 6 commits May 15, 2026 09:47

ilblackdragon added 14 commits May 16, 2026 21:16

Merge remote-tracking branch 'origin/reborn-integration' into reborn/…

7efef49

…fs-rework-unified # Conflicts: # crates/ironclaw_reborn/src/planned_driver.rs

chore: untrack pre-session local docs file

03b5139

ilblackdragon requested a review from serrrfirat May 17, 2026 07:56

github-actions Bot added the scope: ci CI/CD workflows label May 17, 2026

ilblackdragon added 3 commits May 17, 2026 01:58

ilblackdragon added 7 commits May 17, 2026 02:21

serrrfirat approved these changes May 18, 2026

View reviewed changes

Conversation

ilblackdragon commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's in here

Coverage on the universal-FS surface

What's still pending (intentional follow-ups)

Verification

Migration pattern (canonical, used by every commit)

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

ilblackdragon commented May 15, 2026

Uh oh!

serrrfirat commented May 15, 2026

Uh oh!

ilblackdragon commented May 15, 2026

Uh oh!

serrrfirat commented May 17, 2026

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

Uh oh!

serrrfirat commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ilblackdragon commented May 15, 2026 •

edited

Loading