You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs(plan): address review feedback on engine v2 architecture plan
Apply accuracy fixes from PR #2801 review:
- Compaction threshold: describe as configurable via `compaction_threshold`
(defaults to 85%), matching `compact_if_needed` in the Python
orchestrator rather than claiming a fixed 85%.
- Token estimation: move ownership to the Python orchestrator (which
runs the chars/token heuristic); Rust no longer claims to own this.
- Compaction cross-reference: drop the stale "crate-structure block
above includes executor/compaction.rs" note — compaction lives
entirely in Python.
- Reliability injection details (`ENGINE_V2_RELIABILITY_HINTS` kill
switch, `EffectBridgeAdapter` write-backs, `build_step_context`
reads) are labelled as proposed PR-B follow-up work rather than
described as verified reality.
- Denylist phrasing: make it clear that `build_software` remains the
only hard-denylisted v1 tool *after* PR-C lands, not before.
- Provenance rules: document accurately that `ToolOutput` provenance
only injects `RequireApproval` on `Financial` effects; `WriteExternal`
taint comes only from `LlmGenerated`, per policy.rs:126-169.
- Engine-side cleanup: acknowledge that `Session` / `Routine`
identifiers still appear in engine docs/comments; the invariant is
no runtime dependency, not zero string occurrences.
No code changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/plans/2026-03-20-engine-v2-architecture.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -222,15 +222,15 @@ Learning is driven by trace analysis plus learning missions (`self-improvement`,
222
222
223
223
Compaction is orchestrator-owned, in Python. See `crates/ironclaw_engine/orchestrator/default.py:240-310`:
224
224
225
-
- Triggers when token count exceeds 85% of the model limit
225
+
- Triggers when token count exceeds the configured `compaction_threshold`of the model limit (defaults to 85%)
226
226
- Calls `__llm_complete__()` to produce a summary
227
227
- Replaces working messages with `[system message, summary, continuation prompt]`
228
228
- Stores a snapshot in state history for audit
229
229
- Full prior trajectory stays searchable via workspace-backed retrieval; raw history is not replayed into the attention window
230
230
231
-
Rust side provides token estimation, retrieval helpers, and final transcript commit points; the orchestrator owns the mutable working transcript it sends to the LLM.
231
+
Rust side provides retrieval helpers and final transcript commit points; the orchestrator owns the mutable working transcript it sends to the LLM and performs token estimation via a heuristic.
232
232
233
-
Note: the crate-structure block above mentions `executor/compaction.rs` — that file was never created. Compaction lives entirely in Python; the Rust side only exposes the primitives the Python orchestrator calls.
233
+
Note: Compaction lives entirely in Python; the Rust side only exposes the primitives the Python orchestrator calls.
234
234
235
235
### 4.4 `rlm_query()` — full recursive sub-agent
236
236
Unlike `llm_query()` (single-shot text completion), `rlm_query(prompt)` spawns a **child thread with its own CodeAct executor**:
@@ -274,7 +274,7 @@ pub struct Mission {
274
274
275
275
### 4.9 Tool reliability learning
276
276
277
-
`ReliabilityTracker` (`crates/ironclaw_engine/src/reliability.rs`) records EMA-smoothed success rate and latency per action. Tracked in issue #2800 (PR-B): writes from `EffectBridgeAdapter`after every dispatch, reads from `build_step_context`to append a "recently unreliable actions" section to the system prompt when `call_count ≥ 10`and `success_rate < 0.7` (cap 5 entries, kill switch `ENGINE_V2_RELIABILITY_HINTS`).
277
+
`ReliabilityTracker` (`crates/ironclaw_engine/src/reliability.rs`) records EMA-smoothed success rate and latency per action. Proposed follow-up work tracked in issue #2800 (PR-B): wire `EffectBridgeAdapter`to record outcomes after dispatch, have `build_step_context`optionally surface a "recently unreliable actions" prompt section, and finalize any thresholds, entry caps, and feature-flag/kill-switch behavior (including a possible `ENGINE_V2_RELIABILITY_HINTS` control) once implemented.
278
278
279
279
### 4.10 Tests
280
280
- Learning missions produce the correct knowledge artifacts from completed threads
@@ -394,7 +394,7 @@ Approval, authentication, and post-action auth chaining all use the same pause/r
394
394
395
395
#### Routines / Jobs — PARTIAL
396
396
-`routine_create` / `routine_update` / `routine_list` / etc. are translated to mission_* dispatches via `routine_to_mission_alias()` in `src/bridge/effect_adapter.rs` before the v1-denylist check fires. The LLM-facing routine tools go through the mission manager in v2, not the v1 routine engine.
397
-
- Tracked in issue #2800 (PR-C): extend the alias to cover `create_job` / `cancel_job` as well. Only `build_software`remains hard-denylisted as v1-specific infra.
397
+
- Tracked in issue #2800 (PR-C): extend the alias to cover `create_job` / `cancel_job` as well, after which only `build_software`will remain hard-denylisted as v1-specific infra.
398
398
- Routines still work via `/routine` slash commands (fall through to v1 when user is on v1 engine).
399
399
- Remaining work is `create_job` aliasing plus UX communication; greenfield Mission APIs are done.
400
400
@@ -422,7 +422,7 @@ Approval, authentication, and post-action auth chaining all use the same pause/r
422
422
423
423
For `WriteExternal` + `Financial` effects, the unified gate mechanism satisfies the approval invariant:
424
424
425
-
-`PolicyEngine::evaluate_with_provenance` injects `RequireApproval` for `WriteExternal` and `Financial` effects when triggered by `LlmGenerated` or `ToolOutput` provenance (`crates/ironclaw_engine/src/capability/policy.rs:126-169`).
425
+
-`PolicyEngine::evaluate_with_provenance` injects `RequireApproval` for `Financial` effects (via `LlmGenerated` or `ToolOutput` provenance) and `WriteExternal` effects (via `LlmGenerated` provenance) (`crates/ironclaw_engine/src/capability/policy.rs:126-169`).
426
426
- The Tier 0 executor halts the batch on `RequireApproval` and emits `ThreadOutcome::GatePaused` (`crates/ironclaw_engine/src/executor/structured.rs:139-171`).
427
427
- Resume flows through `POST /api/chat/gate/resolve` — same path as auth gates.
428
428
@@ -434,7 +434,7 @@ A separate "simulate → preview → approve → execute" flow is intentionally
434
434
435
435
### 7a. Engine-side cleanup — DONE
436
436
437
-
The `ironclaw_engine` crate contains zero references to `JobState`, `Session`, `Routine`, or v1 delegate types. The engine was built clean from day one on the five primitives (Thread, Step, Capability, MemoryDoc, Project). No migration work is needed inside the crate.
437
+
The `ironclaw_engine` crate has no runtime dependency on `JobState`, `Session`, `Routine`, or v1 delegate types; any remaining mentions are limited to documentation/comments. The engine was built clean from day one on the five primitives (Thread, Step, Capability, MemoryDoc, Project). No migration work is needed inside the crate.
438
438
439
439
### 7b. Host-side cleanup — BLOCKED ON DEFAULT FLIP
0 commit comments