Skip to content

Commit 1b9c47c

Browse files
ilblackdragonclaude
andcommitted
docs(plan): address review feedback on engine v2 architecture plan
Apply accuracy fixes from PR #2801 review: - Compaction threshold: describe as configurable via `compaction_threshold` (defaults to 85%), matching `compact_if_needed` in the Python orchestrator rather than claiming a fixed 85%. - Token estimation: move ownership to the Python orchestrator (which runs the chars/token heuristic); Rust no longer claims to own this. - Compaction cross-reference: drop the stale "crate-structure block above includes executor/compaction.rs" note — compaction lives entirely in Python. - Reliability injection details (`ENGINE_V2_RELIABILITY_HINTS` kill switch, `EffectBridgeAdapter` write-backs, `build_step_context` reads) are labelled as proposed PR-B follow-up work rather than described as verified reality. - Denylist phrasing: make it clear that `build_software` remains the only hard-denylisted v1 tool *after* PR-C lands, not before. - Provenance rules: document accurately that `ToolOutput` provenance only injects `RequireApproval` on `Financial` effects; `WriteExternal` taint comes only from `LlmGenerated`, per policy.rs:126-169. - Engine-side cleanup: acknowledge that `Session` / `Routine` identifiers still appear in engine docs/comments; the invariant is no runtime dependency, not zero string occurrences. No code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 3ed19d5 commit 1b9c47c

1 file changed

Lines changed: 7 additions & 7 deletions

File tree

docs/plans/2026-03-20-engine-v2-architecture.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -222,15 +222,15 @@ Learning is driven by trace analysis plus learning missions (`self-improvement`,
222222

223223
Compaction is orchestrator-owned, in Python. See `crates/ironclaw_engine/orchestrator/default.py:240-310`:
224224

225-
- Triggers when token count exceeds 85% of the model limit
225+
- Triggers when token count exceeds the configured `compaction_threshold` of the model limit (defaults to 85%)
226226
- Calls `__llm_complete__()` to produce a summary
227227
- Replaces working messages with `[system message, summary, continuation prompt]`
228228
- Stores a snapshot in state history for audit
229229
- Full prior trajectory stays searchable via workspace-backed retrieval; raw history is not replayed into the attention window
230230

231-
Rust side provides token estimation, retrieval helpers, and final transcript commit points; the orchestrator owns the mutable working transcript it sends to the LLM.
231+
Rust side provides retrieval helpers and final transcript commit points; the orchestrator owns the mutable working transcript it sends to the LLM and performs token estimation via a heuristic.
232232

233-
Note: the crate-structure block above mentions `executor/compaction.rs` — that file was never created. Compaction lives entirely in Python; the Rust side only exposes the primitives the Python orchestrator calls.
233+
Note: Compaction lives entirely in Python; the Rust side only exposes the primitives the Python orchestrator calls.
234234

235235
### 4.4 `rlm_query()` — full recursive sub-agent
236236
Unlike `llm_query()` (single-shot text completion), `rlm_query(prompt)` spawns a **child thread with its own CodeAct executor**:
@@ -274,7 +274,7 @@ pub struct Mission {
274274

275275
### 4.9 Tool reliability learning
276276

277-
`ReliabilityTracker` (`crates/ironclaw_engine/src/reliability.rs`) records EMA-smoothed success rate and latency per action. Tracked in issue #2800 (PR-B): writes from `EffectBridgeAdapter` after every dispatch, reads from `build_step_context` to append a "recently unreliable actions" section to the system prompt when `call_count ≥ 10` and `success_rate < 0.7` (cap 5 entries, kill switch `ENGINE_V2_RELIABILITY_HINTS`).
277+
`ReliabilityTracker` (`crates/ironclaw_engine/src/reliability.rs`) records EMA-smoothed success rate and latency per action. Proposed follow-up work tracked in issue #2800 (PR-B): wire `EffectBridgeAdapter` to record outcomes after dispatch, have `build_step_context` optionally surface a "recently unreliable actions" prompt section, and finalize any thresholds, entry caps, and feature-flag/kill-switch behavior (including a possible `ENGINE_V2_RELIABILITY_HINTS` control) once implemented.
278278

279279
### 4.10 Tests
280280
- Learning missions produce the correct knowledge artifacts from completed threads
@@ -394,7 +394,7 @@ Approval, authentication, and post-action auth chaining all use the same pause/r
394394

395395
#### Routines / Jobs — PARTIAL
396396
- `routine_create` / `routine_update` / `routine_list` / etc. are translated to mission_* dispatches via `routine_to_mission_alias()` in `src/bridge/effect_adapter.rs` before the v1-denylist check fires. The LLM-facing routine tools go through the mission manager in v2, not the v1 routine engine.
397-
- Tracked in issue #2800 (PR-C): extend the alias to cover `create_job` / `cancel_job` as well. Only `build_software` remains hard-denylisted as v1-specific infra.
397+
- Tracked in issue #2800 (PR-C): extend the alias to cover `create_job` / `cancel_job` as well, after which only `build_software` will remain hard-denylisted as v1-specific infra.
398398
- Routines still work via `/routine` slash commands (fall through to v1 when user is on v1 engine).
399399
- Remaining work is `create_job` aliasing plus UX communication; greenfield Mission APIs are done.
400400

@@ -422,7 +422,7 @@ Approval, authentication, and post-action auth chaining all use the same pause/r
422422

423423
For `WriteExternal` + `Financial` effects, the unified gate mechanism satisfies the approval invariant:
424424

425-
- `PolicyEngine::evaluate_with_provenance` injects `RequireApproval` for `WriteExternal` and `Financial` effects when triggered by `LlmGenerated` or `ToolOutput` provenance (`crates/ironclaw_engine/src/capability/policy.rs:126-169`).
425+
- `PolicyEngine::evaluate_with_provenance` injects `RequireApproval` for `Financial` effects (via `LlmGenerated` or `ToolOutput` provenance) and `WriteExternal` effects (via `LlmGenerated` provenance) (`crates/ironclaw_engine/src/capability/policy.rs:126-169`).
426426
- The Tier 0 executor halts the batch on `RequireApproval` and emits `ThreadOutcome::GatePaused` (`crates/ironclaw_engine/src/executor/structured.rs:139-171`).
427427
- Resume flows through `POST /api/chat/gate/resolve` — same path as auth gates.
428428

@@ -434,7 +434,7 @@ A separate "simulate → preview → approve → execute" flow is intentionally
434434

435435
### 7a. Engine-side cleanup — DONE
436436

437-
The `ironclaw_engine` crate contains zero references to `JobState`, `Session`, `Routine`, or v1 delegate types. The engine was built clean from day one on the five primitives (Thread, Step, Capability, MemoryDoc, Project). No migration work is needed inside the crate.
437+
The `ironclaw_engine` crate has no runtime dependency on `JobState`, `Session`, `Routine`, or v1 delegate types; any remaining mentions are limited to documentation/comments. The engine was built clean from day one on the five primitives (Thread, Step, Capability, MemoryDoc, Project). No migration work is needed inside the crate.
438438

439439
### 7b. Host-side cleanup — BLOCKED ON DEFAULT FLIP
440440

0 commit comments

Comments
 (0)