Skip to content

fix(bridge): mission_* tools accept name; resolves #2583#3155

Merged
ilblackdragon merged 4 commits into
mainfrom
fix/2583-live-routine-creation-test
May 1, 2026
Merged

fix(bridge): mission_* tools accept name; resolves #2583#3155
ilblackdragon merged 4 commits into
mainfrom
fix/2583-live-routine-creation-test

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

Summary

What changed

File Change
src/bridge/effect_adapter.rs New resolve_mission_id helper. Handlers mission_get, mission_fire, mission_pause, mission_resume, mission_complete, mission_update migrated. mission_update renames via new_name; legacy name-as-rename-target preserved only when a UUID id is also passed. Four new unit tests.
src/bridge/engine_actions.rs Mission tool schemas advertise name as primary, id as legacy alternative. Neither field is required. mission_update documents new_name for renames.
tests/support/test_rig.rs Adds channel_handle() accessor and send_exec_approval for the live test's approval responder.
tests/e2e_live_routine.rs New live regression. Drives the issue's exact prompt; asserts no invalid mission id or consecutive code errors; verifies **[name]** notifications on both fire and refire.
tests/fixtures/llm_traces/live/routine_btc_create_test_and_refire.json Recorded trace for deterministic replay (live run scrub-checked — no PII).

How the fix was found

Driving the issue's verbatim prompt through engine v2 with a real LLM (the new test) reproduced the failure on first run. The trace showed:

tool 'routine_fire(bitcoin_price_checker)' failed: effect execution error:
  invalid mission id: invalid length: expected length 32 for simple format, found 0
  ... (× 4)

routine_fire translates to mission_fire via routine_to_mission_alias, which passes params.clone() unchanged. The mission_fire handler then tried to parse the routine name as a UUID and failed. Five errors in a row → orchestrator killed the thread → "5 consecutive code errors" symptom.

Verification

  • cargo test --features libsql --lib bridge::effect_adapter — all 113 effect-adapter tests pass, including 4 new ones for the helper / handler / alias paths.
  • cargo test --features libsql --test e2e_live_routine -- --skip routine_btc — heuristic + structural unit tests pass.
  • IRONCLAW_LIVE_TEST=1 cargo test --features libsql --test e2e_live_routine -- --ignored routine_btc_create_test_and_refire --nocapture — passes in 49s. Tools observed: [\"routine_create(*/5 * * * *)\", \"routine_fire(bitcoin-price-check)\", \"routine_history(bitcoin-price-check)\", \"job_status(...)\"]. No invalid mission id. No consecutive code errors. Notifications delivered on both the initial fire and the second-turn refire.
  • Replay mode (no IRONCLAW_LIVE_TEST) loads the committed fixture and runs deterministically.

Out of scope (parked for separate follow-ups)

The live test surfaced two further observations that are independent of this fix and not test-failing:

  1. Administrative-tool approval gates not surfacing. Even with auto_approve_tools=false, mission_create / routine_create / mission_fire / routine_fire (all classified Administrative per crates/ironclaw_engine/src/gate/tool_tier.rs) ran without firing a single StatusUpdate::ApprovalNeeded. Logged as a warning in the test.
  2. Mission-fire child thread sometimes returns narration instead of data. build_meta_prompt (crates/ironclaw_engine/src/runtime/mission.rs:2132–2139) instructs the child to FINAL() with "1. What you accomplished / 2. Next focus / 3. Goal achieved" — meta-prose. For "fetch X and surface it" routines the user actually wants the data. Logged as a warning in the test.

Both warrant their own issues; flagging here so they're not lost.

Test plan

  • CI green on the new unit tests
  • CI green on the live test in replay mode (uses committed fixture)
  • Manual: bug-bash redo of the original report on staging — confirm no "5 consecutive code errors" surface

The bug bash report on #2583 ("Routine creation fails with '5 consecutive
code errors'") attributed the failure to the budget cap rework in #2843.
A live regression test (added here) reproduces the original prompt
verbatim and shows the actual root cause is much smaller: every
mission_* handler in the bridge required an `id` UUID, but the LLM and
the routine_* alias path naturally pass `name`. Each call collapsed to
"invalid mission id: invalid length 0", retried 4–5×, and tripped the
orchestrator's `consecutive_action_errors >= max_consecutive_errors`
guard.

Fix:
- New `resolve_mission_id` helper in `src/bridge/effect_adapter.rs`
  resolves a MissionId from `name` (preferred) → `id` UUID (legacy
  fallback) → positional arg, looking up names through
  `MissionManager::list_missions(project_id, user_id)`. UUIDs stay
  internal.
- `mission_get`, `mission_fire`, `mission_pause`, `mission_resume`,
  `mission_complete`, `mission_update` migrated to use the helper.
  `mission_update` renames now go through `new_name`; the legacy
  `name`-as-rename-target behaviour is preserved only when a UUID
  `id` is also passed.
- Schemas in `src/bridge/engine_actions.rs` advertise `name` as primary
  with `id` documented as a legacy alternative; neither is `required`.

Live regression test (`tests/e2e_live_routine.rs`):
- Drives the issue's exact prompt through engine v2.
- Asserts no "invalid mission id" / "consecutive code errors" surfaces.
- Verifies routine_fire delivers a `**[name]**` notification on both
  the initial fire and a second-turn re-fire.
- Recorded trace committed for deterministic replay.

Two related observations surfaced by the live test, parked as separate
follow-ups (warnings only — not test-failing):
- Administrative tools (mission_create / routine_create / mission_fire
  / routine_fire) do not surface `ApprovalNeeded` gates with
  `auto_approve_tools=false` even though they are in the
  `AUTONOMOUS_TOOL_DENYLIST`.
- The mission-fire child thread sometimes returns work narration in
  FINAL() instead of the requested data point — a `build_meta_prompt`
  guidance issue independent of #2583.
Copilot AI review requested due to automatic review settings May 1, 2026 11:17
@github-actions github-actions Bot added size: L 200-499 changed lines risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels May 1, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a name-based resolution system for missions and routines, enabling identification via human-readable names in addition to UUIDs. A centralized resolve_mission_id helper is introduced and integrated into various mission actions, while action definitions are updated to prioritize name as the primary identifier. The PR also adds comprehensive regression tests and a live end-to-end test for routine workflows. Feedback indicates that the mission_update logic fails to preserve legacy behavior for renames when a UUID is provided, and notes a discrepancy between the implementation and documentation regarding the resolution priority of identifiers.

Comment thread src/bridge/effect_adapter.rs Outdated
Comment on lines 1039 to 1046
// Renames use `new_name`. The legacy `name` field is
// now the lookup key (consumed by `resolve_mission_id`),
// not the rename target — see the schema in
// `engine_actions.rs` and the migration note in the
// helper docs above.
if let Some(new_name) = params.get("new_name").and_then(|v| v.as_str()) {
updates.name = Some(new_name.to_string());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation of mission_update does not preserve the legacy behavior where name acts as the rename target when a UUID id is also provided, despite the PR description claiming this preservation. The current code only checks new_name for renames, which breaks backward compatibility for existing callers or recorded traces using the (id=UUID, name=NewName) pattern.

                        // Renames use new_name. The legacy name field is
                        // now the lookup key (consumed by resolve_mission_id),
                        // not the rename target — unless a UUID id is also
                        // passed, preserving legacy behavior.
                        if let Some(new_name) = params.get("new_name").and_then(|v| v.as_str()) {
                            updates.name = Some(new_name.to_string());
                        } else if let Some(id_val) = params.get("id").and_then(|v| v.as_str()) {
                            if uuid::Uuid::parse_str(id_val).is_ok() {
                                if let Some(name) = params.get("name").and_then(|v| v.as_str()) {
                                    updates.name = Some(name.to_string());
                                }
                            }
                        }
References
  1. When intercepting and forwarding a tool call to a backend implementation, ensure all relevant parameters from the original call are passed through, including legacy fields required for backward compatibility.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0bd586d (preserves the legacy shape) + b74c42e (handler strips name from the resolver-view params for the legacy shape so the new conflict guard doesn't reject it). Two regression tests guard it.

Comment thread src/bridge/effect_adapter.rs Outdated
Comment on lines +2153 to +2159
/// Resolution order (first match wins):
/// 1. `params.id` — if present and a valid UUID, use directly.
/// 2. `params.name` — look up `(project_id, user_id, name)` via
/// `MissionManager::list_missions`. The same identifier used in
/// `mission_create` is what we resolve back here.
/// 3. `params._args[0]` — positional fallback used by some Tier-0
/// tool-call shapes; tried as UUID first, then as name.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring's resolution order does not match the actual implementation. Specifically, params._args[0] as a valid UUID (checked at line 2200) has higher priority than params.name as a name (checked during the mission list iteration at line 2218), but the docstring lists it as a fallback (#3).

Suggested change
/// Resolution order (first match wins):
/// 1. `params.id` — if present and a valid UUID, use directly.
/// 2. `params.name`look up `(project_id, user_id, name)` via
/// `MissionManager::list_missions`. The same identifier used in
/// `mission_create` is what we resolve back here.
/// 3. `params._args[0]`positional fallback used by some Tier-0
/// tool-call shapes; tried as UUID first, then as name.
/// Resolution order (first match wins):
/// 1. params.id — if present and a valid UUID, use directly.
/// 2. params._args[0]if present and a valid UUID, use directly.
/// 3. params.name — look up (project_id, user_id, name) via
/// MissionManager::list_missions.
/// 4. params.idlook up by name (legacy fallback).
/// 5. params._args[0] — look up by name (positional fallback).
References
  1. When resolving identifiers from stored tool data or parameters, implement and document a clear fallback chain to check multiple possible field names or positions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b74c42e. The docstring on resolve_mission_id now matches the new four-step resolution order including the conflict guard. _args[0] no longer short-circuits ahead of explicit name (it's now a name candidate only — see serrrfirat's related comment).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes routine/mission tool calls in the bridge by allowing mission_* handlers (and their routine_* aliases) to resolve missions by name instead of requiring a UUID id, preventing retry loops that previously tripped the orchestrator’s “consecutive code errors” guard (issue #2583).

Changes:

  • Add resolve_mission_id and migrate mission_get/fire/pause/resume/complete/update to accept name (preferred) with legacy id fallback.
  • Update mission tool schemas to advertise name as primary and introduce new_name for renames in mission_update.
  • Add a new live/replayable end-to-end regression test plus fixture, and extend the test rig with approval helpers.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/bridge/effect_adapter.rs Adds resolve_mission_id and switches mission handlers to name-first resolution.
src/bridge/engine_actions.rs Updates mission tool schemas and guidance to prefer name and use new_name for rename.
tests/support/test_rig.rs Adds channel_handle() and send_exec_approval() for background helpers in tests.
tests/e2e_live_routine.rs Adds a live/replayable E2E regression test for #2583.
tests/fixtures/llm_traces/live/routine_btc_create_test_and_refire.json Adds recorded trace fixture for deterministic replay.
Comments suppressed due to low confidence (1)

src/bridge/effect_adapter.rs:1027

  • mission_complete resolves an explicit UUID via resolve_mission_id and then calls MissionManager::complete_mission(id) without any ownership/shared-access check. Since complete_mission itself does not enforce tenant isolation, a caller who knows/guesses another user's mission UUID could complete it. Add an authorization check here (e.g., load the mission and verify mission.is_owned_by(context.user_id) or mission.owner_id().is_shared() before completing), or change the engine API to require user_id like fire_mission/pause_mission/resume_mission/update_mission do.
            "mission_complete" => {
                let id =
                    resolve_mission_id(mgr.as_ref(), context.project_id, &context.user_id, params)
                        .await;
                match id {
                    Ok(id) => match mgr.complete_mission(id).await {
                        Ok(()) => Ok(serde_json::json!({"status": "completed"})),
                        Err(e) => Err(e),

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/bridge/engine_actions.rs Outdated
&["id"],
&[],
&[
"Identify the mission to update with `name` (preferred) or `id` — exactly one is required.",
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 720a0db. Discovery summary now matches the actual handler tolerance: "Identify the mission with name (preferred) or id (UUID). If both are provided they must identify the same mission, or use the legacy {id, name} rename shape (where name is the new name) — otherwise the resolver errors with 'identify different missions'."

Comment thread tests/e2e_live_routine.rs Outdated
self.handle.abort();
}
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b74c42e. ApprovalAutoResponder now implements Drop that aborts the handle. shutdown() is kept for explicitness; abort() is idempotent so the double-abort path is safe.

Comment thread src/bridge/effect_adapter.rs Outdated
Comment on lines +973 to +976
// We resolved the id via name lookup, so this
// should be unreachable in practice (the lookup
// already proved the mission exists). Keep the
// arm so the match is exhaustive.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b74c42e. The comment now correctly notes that Ok(None) is reachable when the caller passed an explicit UUID for a deleted/unknown mission — names that didn't match are caught earlier by resolve_mission_id with a more useful error.

Comment thread src/bridge/effect_adapter.rs Outdated

Err(EngineError::Effect {
reason: format!(
"mission not found by name: tried {name_candidates:?} for user '{user_id}'. \
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b74c42e. user_id is no longer included in the mission not found error. Same fix on mission_get's "belongs to another user" arm — the foreign mission's UUID is now scrubbed from the LLM-visible reason and preserved only via tracing::debug!.

Comment on lines 1038 to 1046
let mut updates = ironclaw_engine::MissionUpdate::default();
if let Some(name) = params.get("name").and_then(|v| v.as_str()) {
updates.name = Some(name.to_string());
// Renames use `new_name`. The legacy `name` field is
// now the lookup key (consumed by `resolve_mission_id`),
// not the rename target — see the schema in
// `engine_actions.rs` and the migration note in the
// helper docs above.
if let Some(new_name) = params.get("new_name").and_then(|v| v.as_str()) {
updates.name = Some(new_name.to_string());
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0bd586d (initial preservation) + b74c42e (mission_update handler now strips name from the resolver-view params when the legacy shape is detected, so the new conflict guard doesn't reject it). Two regression tests pin it: mission_update_preserves_legacy_id_plus_name_rename and mission_update_renames_via_new_name_and_uses_name_as_lookup.

Comment thread src/bridge/effect_adapter.rs Outdated

// De-duplicate while preserving order.
let mut seen: std::collections::HashSet<&str> = std::collections::HashSet::new();
name_candidates.retain(|s| seen.insert(*s));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance regression on hot path (High): every name-keyed mission_* call now loads the user's full mission list. A cron routine firing every 5 min on an account with hundreds of historical missions (including completed/archived) pays an O(N) DB read per fire — the previous UUID-only path was O(1).

Suggest adding MissionManager::find_by_name(project_id, user_id, name) so the bridge does a targeted lookup. Even a small per-(project_id, user_id) name→id cache inside MissionManager (invalidated on create/rename/complete) would pay for itself fast — the same mission name is hit on every fire/get/pause cycle.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b74c42e. Added MissionManager::find_by_name(project_id, user_id, name) -> Result<Option<Mission>, EngineError> typed API; bridge resolver switched off list_missions onto it. Today find_by_name still scans the user's missions in-memory; the doc-comment names the future store-level index drop-in (Store::find_mission_by_name) explicitly. Adding it later is non-breaking — every caller goes through this method now, so the optimization plugs in without churn at the call sites.

Comment thread src/bridge/effect_adapter.rs Outdated
// Renames use `new_name`. The legacy `name` field is
// now the lookup key (consumed by `resolve_mission_id`),
// not the rename target — see the schema in
// `engine_actions.rs` and the migration note in the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Foot-gun (High): if a caller passes mission_update({id: <real_uuid_A>, name: "mission-B", new_name: "renamed"}), resolve_mission_id resolves id first (UUID fast-path), so mission A is renamed even though name says "mission-B". The schema text in engine_actions.rs says name is the lookup key — the helper logic contradicts that whenever id is also present.

Defense-in-depth: when both id (UUID) and name are provided and resolve to different missions, error out with "both id and name provided but they identify different missions" instead of silently preferring one. The doc comment on resolve_mission_id should also call out this precedence trap explicitly so the next caller doesn't trip it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b74c42e. Added the conflict guard you suggested. New regression test mission_resolver_errors_when_id_and_name_disagree pins it. Error reads: "both id and name provided but they identify different missions: id=X vs name=[Y] → Z. Provide only one — or fix the name to match the id." The resolver docstring also explicitly calls out the conflict-guard contract.

/// cause of the bug bash failure (not the budget-rework hypothesis in
/// the linked #2843).
#[tokio::test]
async fn mission_fire_resolves_by_name_when_id_absent() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage gap (High): all six mission_* handlers (get, fire, pause, resume, complete, update) now route through resolve_mission_id, but only mission_fire got new by-name regression tests. The most foot-gun-prone handler — mission_update (see related comment on the new_name rename) — has no by-name test at all, and the existing mission_update_string_guardrails_rejected_via_execute_action still passes id directly so doesn't exercise the new path.

Per CLAUDE.md "Test Through the Caller, Not Just the Helper", each migrated handler needs a caller-level test. Suggest three more symmetric to this one: mission_get_resolves_by_name, mission_complete_resolves_by_name, and mission_update_renames_via_new_name_when_name_is_lookup_key (the last one is the regression guard for the id+name foot-gun).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b74c42e. Added by-name caller-level tests for every migrated handler: mission_get_resolves_by_name, mission_complete_resolves_by_name, mission_pause_and_resume_resolve_by_name, plus the conflict-guard regression mission_resolver_errors_when_id_and_name_disagree and the legacy-rename regression mission_update_preserves_legacy_id_plus_name_rename.

.and_then(|a| a.get(0))
.and_then(|v| v.as_str())
.filter(|s| !s.is_empty())
{
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge case (Medium): if a caller passes {name: "actual-mission", _args: ["<uuid-of-other>"]}, this _args[0] UUID parse short-circuits with return Ok(MissionId(uuid)) — the explicit name field is silently ignored. This contradicts the docstring's stated principle that name is preferred and _args[0] is a "positional fallback used by some Tier-0 tool-call shapes".

Two ways out:

  • (a) Preserve preference: move the _args[0] UUID parse to after name_candidates lookup falls through, so explicit name wins.
  • (b) Treat positional as always-a-name: drop the _args[0] UUID-parse branch and only push to name_candidates. UUID-shaped strings would still match if any mission happens to be named with a UUID — vanishingly rare and an explicit caller decision.

Either is fine; the current ordering is the worst of both.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b74c42e — went with option (b). _args[0] is now treated as a name candidate only, never honoured as a UUID. UUID-shaped strings would only match if a mission happens to be named with one (vanishingly rare and the explicit caller decision you noted).

Comment thread src/bridge/effect_adapter.rs Outdated
}
}

Err(EngineError::Effect {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Information disclosure (Medium): echoing user_id into the error reason leaks server-side identity into the LLM context, conversation history, and any tool-error telemetry. The previous error ("mission not found: {id_str}") only echoed user input.

EngineError::Effect::reason flows back to the LLM and is persisted — depending on the channel, user_id is a UUID (low value), but for some flows it's an external identity (chat id, session-derived token). Suggest dropping the for user '{user_id}' segment entirely. The LLM can't act on it anyway (it can't list a different user's missions).

Same goes for the mission {} belongs to another user reason a few lines above (mission_get arm at the new resolve site) — there's no LLM-actionable info in the foreign mission's UUID, and emitting it confirms the existence of a mission the caller has no claim to.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b74c42e. Both error reasons are scrubbed: user_id is gone from the "mission not found" error, and mission_get's "belongs to another user" reason no longer echoes the foreign mission's UUID (preserved only via tracing::debug!).

Comment thread src/bridge/effect_adapter.rs Outdated
/// tool-call shapes; tried as UUID first, then as name.
///
/// A non-UUID `id` value is treated as a name (some callers conflate the
/// two), which keeps backward compat for `mission_complete`'s historical
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture (Medium): name-or-id resolution is conceptually a MissionManager concern, not a bridge-adapter concern. Keeping this 60-line policy here means any future caller — CLI mission commands, the WS handler, a WASM channel that wants a mission lookup, integration tests outside the bridge — duplicates the resolution rules.

Suggest moving to a typed API on MissionManager:

enum MissionRef<'a> {
    Id(MissionId),
    Name(&'a str),
}

impl MissionManager {
    pub async fn resolve(&self, project_id: ProjectId, user_id: &str, r: MissionRef<'_>)
        -> Result<MissionId, EngineError> { ... }
}

The bridge then becomes a one-liner that picks the right variant from params. The two ambiguity rules (id+name simultaneous; _args[0]) collapse into one well-scoped place that has a unit test suite of its own.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially addressed in b74c42e. The typed engine API (MissionManager::find_by_name) is in place — that's the part future callers (CLI, WS, WASM channels, integration tests) will actually need. The 60-line param-parsing in the bridge stayed put because it deals in JSON and Tier-0 calling conventions (_args[0] etc.) that aren't a MissionManager concern. Happy to lift the param-shape policy onto a MissionRef enum + MissionManager::resolve in a follow-up if you'd prefer the typed-API form — should I file that?

"HIT"
],
[
"cf-ray",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trace fixture noise (Medium): the recorded CoinGecko response carries every header from the real Cloudflare/Coingecko response — cf-ray, x-request-id, etag, age, date, cf-cache-status, strict-transport-security, content-security-policy-report-only, etc. They aren't secrets, but they:

  • bloat the diff (~941 lines for one e2e test, most of it duplicated headers across three identical price requests),
  • pin the fixture to one Cloudflare datacenter (-FUK) and a 2026-05-01 timestamp,
  • risk replay surprises if any HTTP cache layer in the stack decides to honor etag/age.

Suggest a fixture-redaction helper in tests/support/ (e.g. redact_response_headers(allow=["content-type", "content-length"])) applied at fixture-write time. Net result is a smaller, more reproducible fixture and one less drift source for replay-mode tests.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred — agreed on the diagnosis. The fixture is committed to enable replay-mode CI runs, and a redaction helper that's careful to keep replay deterministic is its own small piece of work. Will file a follow-up issue today if you want; otherwise happy to tackle in a tiny PR right after this one merges.

…alf 1)

Issue #3133 reported a Gmail-sending mission firing every 3 minutes with
the FINAL() body "Failed to send email. Status: None Error: None / Next
focus: Debug Gmail authentication." The new live test under #2583's
infrastructure reproduced the actual mechanism: a mission's child thread
hits a `tool_activate(gmail)` OAuth gate, the engine emits
`ThreadOutcome::GatePaused`, and `process_mission_outcome_and_notify` —
which only matched `Completed` / `Failed` / `MaxIterations` — silently
swallowed it via a bare `_ => {}` arm. The cron scheduler kept ticking,
the user got no auth-tray prompt, and the LLM eventually narrated the
two-None pattern when its `http` fallback failed.

Engine side (`crates/ironclaw_engine/src/runtime/mission.rs`):
- New `MissionGateInfo { gate_name, action_name, parameters,
  request_id, resume_kind }` struct, exported from the engine crate.
- `MissionNotification` gains an `Option<MissionGateInfo> gate` field.
- New `ThreadOutcome::GatePaused` arm in
  `process_mission_outcome_and_notify` that:
  * Transitions the mission to `MissionStatus::Paused`. The cron tick
    path filters by `Active` (lines 1058 / 1131 / 1194 / 1781), so
    pausing is enough to stop the ghost re-firing pattern.
  * Records `PAUSED: gate '...' on action '...' awaiting <kind>` in
    `approach_history`.
  * Emits a `MissionNotification` with `gate: Some(...)` and a
    user-facing nudge tailored per `ResumeKind` (Authentication /
    Approval / External).
- Regression test
  `gate_paused_pauses_mission_and_emits_gate_notification` pinning the
  contract.

Bridge side (`src/bridge/router.rs::handle_mission_notification`):
- New optional args (`auth_manager`, `tools`, `extension_manager`)
  plumbed through the mission-notification dispatcher loop.
- When `notif.gate` is `Some`, additionally emit a structured
  `StatusUpdate` per `ResumeKind`:
  * `Authentication` → `StatusUpdate::AuthRequired { extension_name,
    instructions, auth_url, setup_url, request_id }`. Extension
    identity is resolved via the canonical
    `resolve_extension_for_action` helper (the same one foreground
    auth gates use), so the gateway UI's auth tray fires on the same
    path.
  * `Approval` → `StatusUpdate::ApprovalNeeded { request_id,
    tool_name, description, parameters, allow_always }`.
  * `External` → no user-actionable surface; falls through to the
    text response only.
- The friendly text response is still broadcast on `notify_channels`
  alongside the structured status update, so the user sees both an
  auth-tray entry and a chat message.

This is half 1 of the fix discussed in #3133. Half 2 (auto-resuming a
paused mission once auth completes) is a separate follow-up: the
mission stays Paused after the user resolves the gate today; they have
to explicitly call `mission_resume`.

Test infrastructure:
- New `tests/e2e_live_mission_gmail.rs` driving the issue's flow with
  Gmail OAuth seeded from the source DB. Asserts no "Status: None
  Error: None" pattern and no "consecutive code errors" surface.
- Shared mission/routine test helpers extracted to
  `tests/support/live_mission_helpers.rs` (ApprovalAutoResponder,
  looks_like_routine_notification, tool_is, wait_for_response_matching)
  so #2583's routine test and #3133's mission test reuse the same
  bug-pattern detectors without copy-paste drift.
@github-actions github-actions Bot added scope: channel/web Web gateway channel size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules and removed size: L 200-499 changed lines risk: low Changes to docs, tests, or low-risk modules labels May 1, 2026
@ilblackdragon
Copy link
Copy Markdown
Member Author

Half 2 follow-up filed as #3166: 'Mission auto-resume after auth/approval gate resolution'. That issue covers persisting Mission.paused_gate, hooking the credential-write and gate-resolve paths to auto-resume matching paused missions, and the immediate-fire-on-resume behaviour. Out of scope for this PR.

Addresses the high/medium-severity findings from the PR #3155 review
(serrrfirat, Gemini, Copilot). No behaviour change for the happy path
the original PR established (name-keyed mission_* dispatch keeps
working); the conflict and information-disclosure paths get safer.

Engine (`crates/ironclaw_engine/src/runtime/mission.rs`):
- New `MissionManager::find_by_name(project_id, user_id, name)` typed
  API. Today it's a thin wrapper over `list_missions_with_shared`
  plus an in-memory filter, but every name-keyed call goes through
  it so a future store-level index drops in non-breakingly. The
  doc-comment names the optimisation (Store::find_mission_by_name)
  and the perf concern from the review explicitly.
- Two new unit tests: round-trip via create+find, and the per-user
  scope (mission for bob is invisible when alice queries).

Bridge (`src/bridge/effect_adapter.rs::resolve_mission_id`):
- Switched off `list_missions` and onto `mgr.find_by_name`, so the
  hot path stops doing an O(N) scan over the user's full mission
  set per fire.
- Added the id+name conflict guard. If both are supplied AND the
  name resolves to a different mission than the id identifies, the
  resolver now errors loudly:
    "both id and name provided but they identify different missions:
     id=X vs name=[Y] → Z. Provide only one — or fix the name to
     match the id."
  Previously the UUID would silently win, which is the foot-gun
  serrrfirat flagged: a typoed `name` would rename the wrong
  mission. New regression test
  `mission_resolver_errors_when_id_and_name_disagree` pins it.
- Inverted the `_args[0]` precedence: a positional arg that *parses
  as a UUID* used to short-circuit ahead of `params.name`, which
  contradicted the docstring's "explicit name preferred". Now
  `_args[0]` is treated as a name candidate only; positional UUIDs
  aren't honoured at all (rare in practice — the Tier-0 callers
  that use `_args[0]` overwhelmingly pass names).
- Dropped `user_id` from the "mission not found" error string. The
  reason field flows to the LLM context and conversation history;
  echoing the server-side identity served no purpose. Same fix on
  the `mission_get` "belongs to another user" arm — the foreign
  mission's UUID is now scrubbed from the LLM-visible reason and
  preserved only via tracing::debug!.
- Updated the stale "unreachable Ok(None)" comment in mission_get
  per Copilot — the arm IS reachable when the caller passed an
  explicit UUID for a deleted/unknown mission.

Bridge (`mission_update` handler):
- The legacy `{id, name=NEWNAME}` rename shape (preserved by the
  earlier review-fix commit on this branch) was tripping the new
  conflict guard because the resolver saw both `id` and `name` as
  identifiers. The handler now strips `name` from the resolver-view
  params *only* when the legacy shape is detected (id is a valid
  UUID, `new_name` absent), so the resolver never sees a
  would-be-rename `name` as a lookup target. Both the legacy and
  canonical (`new_name`) paths keep their existing tests.

By-name regression tests (`src/bridge/effect_adapter.rs`):
- Closes the test gap flagged by serrrfirat — `mission_fire` was
  the only handler with a by-name caller-level test before. Adds:
  - `mission_get_resolves_by_name`
  - `mission_complete_resolves_by_name`
  - `mission_pause_and_resume_resolve_by_name`
  - `mission_resolver_errors_when_id_and_name_disagree` (the new
    conflict guard above)
- All migrated handlers now have a caller-level test that exercises
  the name path through `execute_action`.

Test infra (`tests/support/live_mission_helpers.rs`):
- `ApprovalAutoResponder` now implements `Drop` (per Copilot). A
  panicking test would otherwise leak the background polling task
  across test boundaries. `shutdown()` is kept for explicitness;
  abort() is idempotent.
Copilot AI review requested due to automatic review settings May 1, 2026 15:10
@ilblackdragon
Copy link
Copy Markdown
Member Author

Pushed b74c42e addressing the review feedback. Summary by reviewer:

serrrfirat:

  • ✅ Perf regression — MissionManager::find_by_name typed API; bridge resolver stops scanning list_missions. Doc-comment names the future Store-level index drop-in.
  • ✅ id+name foot-gun — new conflict guard errors loudly with both ids surfaced. Regression test mission_resolver_errors_when_id_and_name_disagree.
  • _args[0] UUID precedence inversion — positional UUIDs no longer short-circuit ahead of explicit name.
  • ✅ user_id info disclosure — dropped from EngineError reason strings on both mission not found and mission_get ownership arms; preserved in tracing::debug!.
  • ✅ Test gap on migrated handlers — added by-name caller-level tests for mission_get, mission_complete, mission_pause+mission_resume.

Gemini:

  • ✅ Legacy {id, name} rename preservation — landed earlier in commit 0bd586d with two regression tests; the new mission_update handler strips name from the resolver-view params when it detects the legacy shape, so the conflict guard doesn't reject it.
  • ✅ Docstring/behaviour drift on resolve_mission_id — the docstring is rewritten to match the new four-step resolution order including the conflict guard.

Copilot:

  • ApprovalAutoResponder task leak — Drop impl aborts the handle on panic.
  • ✅ Stale "unreachable Ok(None)" comment in mission_get — corrected to call out the UUID-with-no-mission case.

Deferred:

  • Trace fixture noise (Cloudflare headers) — filed for a small followup; the fixture is committed to enable replay-mode CI runs and stripping requires care to keep replay deterministic.
  • Move resolver to MissionManager (architecture) — the typed API is now there (find_by_name); the param-parsing stays bridge-side because it deals with JSON/Tier-0 conventions.

Test results on b74c42e: 448 bridge tests + 89 engine mission tests, all green. fmt + clippy -D warnings clean.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the engine-v2 bridge so mission_* (and routine_* aliases) can target missions by name (preferred) with UUID id as a legacy fallback, addressing the retry loop behind issue #2583. It also adds mission gate metadata/notification surfacing and introduces live/replay E2E regression coverage.

Changes:

  • Add name→id resolution (resolve_mission_id) and migrate mission_get/fire/pause/resume/complete/update handlers to accept name (with legacy id fallback).
  • Update mission tool schemas to advertise name as primary and introduce new_name for mission_update renames.
  • Add live/replay E2E tests + shared helpers and fixtures; extend mission notifications with gate metadata and bridge-side status emission.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/bridge/effect_adapter.rs Adds resolve_mission_id and migrates mission handlers to name-first resolution; adds unit tests around the new behavior.
src/bridge/engine_actions.rs Updates tool schemas/descriptions to prefer name, keeps id as optional legacy, adds new_name for renames.
src/bridge/router.rs Extends mission-notification handling to surface gate state via StatusUpdate::{AuthRequired,ApprovalNeeded}.
crates/ironclaw_engine/src/runtime/mission.rs Adds MissionGateInfo and emits it on ThreadOutcome::GatePaused; pauses missions on gate outcomes.
crates/ironclaw_engine/src/lib.rs Re-exports MissionGateInfo.
src/channels/web/tests/no_silent_drop.rs Adapts test notifications to include the new gate field and updated handler signature.
tests/support/test_rig.rs Adds channel_handle() and a helper to send structured exec approvals.
tests/support/mod.rs Registers shared live-mission helper module behind libsql.
tests/support/live_mission_helpers.rs Adds shared live-test helpers (notification heuristic, approval auto-responder, waiters).
tests/e2e_live_routine.rs Adds live/replay E2E regression test for #2583 prompt flow and notification assertions.
tests/e2e_live_mission_gmail.rs Adds live/replay E2E test scenario around Gmail mission behavior and gate-related regression markers.
tests/fixtures/llm_traces/live/routine_btc_create_test_and_refire.json Adds recorded trace fixture for deterministic replay of the live routine test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/bridge/router.rs
Comment on lines +4166 to +4206
if let Some(gate) = &notif.gate
&& let Some(tools) = tools
{
for channel_name in &notif.notify_channels {
let metadata = serde_json::json!({"user_id": notif.user_id});
let status = match &gate.resume_kind {
ironclaw_engine::ResumeKind::Authentication {
credential_name,
instructions,
auth_url,
} => {
let extension_name = resolve_extension_for_action(
auth_manager,
extension_manager,
tools,
&gate.action_name,
&gate.parameters,
credential_name.as_str(),
&notif.user_id,
)
.await;
StatusUpdate::AuthRequired {
extension_name,
instructions: Some(instructions.clone()),
auth_url: auth_url.clone(),
setup_url: None,
request_id: Some(gate.request_id.clone()),
}
}
ironclaw_engine::ResumeKind::Approval { allow_always } => {
StatusUpdate::ApprovalNeeded {
request_id: gate.request_id.clone(),
tool_name: gate.action_name.clone(),
description: format!(
"Mission '{}' is waiting for approval to run '{}'.",
notif.mission_name, gate.action_name
),
parameters: gate.parameters.clone(),
allow_always: *allow_always,
}
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 720a0db. MissionGateInfo.request_id renamed to call_id (matches what it actually carries — the engine LLM tool-call id). New MissionGateInfo.gate_request_id: Uuid is freshly generated at gate-emit time, and the bridge now forwards it (not call_id) into StatusUpdate::AuthRequired.request_id / ApprovalNeeded.request_id. The gateway resolve POST (features/chat/mod.rs:183/274/303 + platform/ws.rs:236) now sees a UUID-shaped string and won't 400. Mapping the gateway-issued UUID back to the engine call_id for actual auto-resume is half-2 (#3166).

Comment on lines +139 to +140
/// Engine-side `call_id` used to resume.
pub request_id: String,
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 720a0db — see reply on the sibling comment. Field renamed to call_id; new gate_request_id: Uuid is the user-facing identity. Doc-comment on the struct now explicitly contrasts the two fields and points at #3166.

Addresses Copilot's review on PR #3155 commit b74c42e:

> StatusUpdate::ApprovalNeeded.request_id is later parsed as a UUID by
> the web approval endpoints / WS handler (they reject non-UUID
> request_id). Here the mission gate wiring forwards gate.request_id,
> but that value originates from ThreadOutcome::GatePaused.call_id
> (LLM tool-call id like call_...), so approvals from mission/routine
> notifications will be impossible to resolve via the gateway UI.

Confirmed by reading the gateway resolve handler at
`src/channels/web/features/chat/mod.rs:183`, `:274`, `:303` and the
WS handler at `src/channels/web/platform/ws.rs:236` — all four call
`Uuid::parse_str(&req.request_id)` and reject non-UUIDs with 400 Bad
Request. Mission auth-tray entries built off the LLM `call_id` would
silently 400 every resolve POST, making the tray decorative.

Engine (`crates/ironclaw_engine/src/runtime/mission.rs`):
- Renamed `MissionGateInfo.request_id` → `MissionGateInfo.call_id`
  (its actual identity — engine LLM tool-call id) so the field name
  matches what it carries.
- Added `MissionGateInfo.gate_request_id: Uuid` (a fresh UUID
  generated at gate-emit time). Doc-comment explicitly contrasts the
  two fields and points at #3166 for the half-2 wiring that maps
  back from `gate_request_id` to `call_id` for engine resume.
- Updated `gate_paused_pauses_mission_and_emits_gate_notification`
  to assert both fields are populated and that `gate_request_id` is
  a valid (non-nil) UUID.

Bridge (`src/bridge/router.rs::handle_mission_notification`):
- AuthRequired and ApprovalNeeded translations now forward
  `gate.gate_request_id.to_string()` into `StatusUpdate.request_id`,
  not `gate.call_id`. Inline comment names the gateway/WS handler
  files that parse it as UUID so the next maintainer doesn't
  re-introduce the bug.

Schema (`src/bridge/engine_actions.rs::mission_update`):
- Discovery summary previously said "exactly one is required" for
  name/id, but the handler tolerates both (with the conflict guard
  added on b74c42e + the legacy `{id, name}` rename shape). Wording
  now matches actual behaviour and explicitly calls out the
  conflict-guard outcome and the legacy rename shape so a tool-using
  model knows what to expect.
@ilblackdragon
Copy link
Copy Markdown
Member Author

Pushed 720a0db closing the last review-feedback items. Full status:

Just landed (720a0db):

  • Copilot's UUID/call_id mismatch (high) — MissionGateInfo.request_id renamed to call_id; new gate_request_id: Uuid is generated at gate-emit time and forwarded to StatusUpdate.request_id. Gateway resolve POST (Uuid::parse_str at features/chat/mod.rs:183/platform/ws.rs:236) now accepts the auth-tray entries instead of 400-ing.
  • Schema wording vs handler tolerance — mission_update discovery summary rewritten to match the conflict-guard contract.

Already landed earlier on this branch:

  • b74c42e: find_by_name typed API + bridge perf fix; id+name conflict guard; _args[0] precedence flip; user_id info-disclosure scrub on two error paths; Drop on ApprovalAutoResponder; stale Ok(None) comment; by-name regression tests for mission_get/complete/pause/resume + conflict-guard regression.
  • 0bd586d: legacy {id, name} rename preservation + two regression tests.

Deferred (each individually replied on the inline thread):

  • Trace fixture noise (Cloudflare headers) — happy to file a follow-up issue or land in a tiny PR right after this merges; awaiting your preference.
  • Full MissionRef enum + MissionManager::resolve typed API — typed engine surface is now in place via find_by_name; lifting the bridge's JSON-param parsing into a typed enum is a clean follow-up if you want it.

Test results on 720a0db: 19/19 mission tests in bridge::effect_adapter, 89/89 runtime::mission, fmt + clippy clean across both crates.

@ilblackdragon ilblackdragon merged commit b1ca9d0 into main May 1, 2026
34 checks passed
@ilblackdragon ilblackdragon deleted the fix/2583-live-routine-creation-test branch May 1, 2026 17:24
This was referenced May 7, 2026
@ironclaw-ci ironclaw-ci Bot mentioned this pull request May 8, 2026
ilblackdragon added a commit that referenced this pull request May 8, 2026
…) (#3366)

* fix(missions): auto-resume paused missions after gate resolution (#3166)

Half-2 of #3133. Half-1 (PR #3155) made the engine emit a typed
`GatePaused` outcome that pauses a mission and surfaces an
`AuthRequired` status update on the user's auth tray. Until now the
mission stayed Paused even after the user completed OAuth — they had
to manually call `mission_resume`. This patch wires the credential-
write and gate-resolve paths so paused missions auto-resume the moment
the gate they were waiting on is resolved.

Engine (`crates/ironclaw_engine/`)
- `Mission` gains a persistent `paused_gate: Option<MissionGateInfo>`.
  `MissionGateInfo` moved from `runtime/mission.rs` to `types/mission.rs`
  and made `Serialize`/`Deserialize` so it round-trips through the
  store. Existing rows decode with `paused_gate = None` via
  `#[serde(default)]`.
- `MissionManager::resume_paused_for_credential(credential_name, user_id)`
  walks paused missions visible to the user, matches by
  `Authentication.credential_name`, transitions Paused → Active,
  clears `paused_gate`, and kicks an immediate fire for non-Manual
  cadences. Returns the resumed ids.
- `MissionManager::resume_paused_for_request_id(uuid, GateResolutionOutcome,
  user_id)` resumes by `gate_request_id` for approval/external gates.
  Approved → resume + fire; Denied/Cancelled → mark Failed.
- `resume_mission` clears `paused_gate` so a manual resume doesn't
  leave stale gate state behind.
- 4 new unit tests pin the matrix:
  `oauth_completion_resumes_paused_mission`,
  `unrelated_credential_write_does_not_resume_paused_mission`,
  `gate_resolution_approved_resumes_matching_paused_mission`,
  `gate_resolution_denied_marks_paused_mission_failed`.

Bridge (`src/bridge/`)
- New pub helpers `resume_paused_missions_for_credential` and
  `resume_paused_missions_for_gate_request` in `router.rs`. Both
  best-effort: helper failures log and swallow so an OAuth flow
  cannot 5xx because of stale mission state.

Wiring
- Gateway OAuth callback (`channels/web/features/oauth/mod.rs`) and
  WASM-tool OAuth completion (`extensions/manager.rs`) call
  `resume_paused_missions_for_credential` immediately after
  `oauth::store_oauth_tokens` succeeds, before resolving the
  foreground gate.
- `/api/chat/gate/resolve` (`channels/web/features/chat/mod.rs`)
  fans the disposition out to
  `resume_paused_missions_for_gate_request` after the foreground
  gate is resolved. CredentialProvided is excluded — the credential
  write itself triggers the OAuth-side hook.

Tests
- The Rust live test `tests/e2e_live_mission_gmail.rs` is replaced
  by `tests/e2e/scenarios/test_mission_gmail_3133.py`. The Rust
  variant required a real Gmail OAuth token in the developer's
  `~/.ironclaw/` DB; the Python tier uses the deterministic mock
  LLM + the existing `/oauth/callback` flow, so it runs in CI.
- The Python file pins three checks: the OAuth callback drives the
  resume hook without erroring when no paused mission matches; the
  browser sees the extension flip to authenticated; no thread in
  the gateway carries the `Status: None`+`Error: None` dual-marker
  fingerprint or the `consecutive code errors` surface from #2583.
- Out of scope: chat-driven mission lifecycle (LLM emits
  `mission_create` + `mission_fire` and the child trips
  `tool_activate(gmail)`). The unit tests cover that state
  transition in isolation; adding canned mock-LLM responses for
  the chat path is a follow-up.
- `secrets_store::create/update` (manual API-token entry) hook is
  also out of scope for this PR — only OAuth completion paths are
  wired today. Easy to extend later.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): live-LLM record/replay infra for #3133 mission auto-resume

Replaces the stub mock-LLM Playwright test with real live-test
infrastructure that mirrors the Rust `LiveTestHarnessBuilder`
record/replay pattern (`tests/support/live_harness.rs`) for the
Python tier:

- `tests/e2e/live_llm_proxy.py` — HTTP proxy that sits between
  ironclaw and a real LLM. Two modes:
  * RECORD (`IRONCLAW_LIVE_TEST=1`): forwards `/v1/chat/completions`
    to upstream (`IRONCLAW_LIVE_LLM_BASE_URL` / `..._API_KEY` /
    `..._MODEL`) and writes (request_hash, response) pairs to a
    JSON fixture. Canonicalisation strips tool-call ids and other
    non-deterministic fields so re-recordings match.
  * REPLAY: serves recorded responses by canonical-request hash.
  Streaming requests are re-emitted as a single SSE chunk + `[DONE]`
  on replay, which is sufficient for ironclaw's chunk accumulator.

- `tests/e2e/live_harness.py` — `start_live_proxy(test_name)`,
  `is_live_mode()`, and `proxy_state()` helpers. Skips (not fails)
  in replay mode when the per-test fixture is missing so a fresh
  checkout doesn't bog down on un-recorded traces.

- `tests/e2e/conftest.py` adds `mission_gmail_live_server` /
  `mission_gmail_live_page` fixtures. The fixture wires:
  * the live proxy as `LLM_BASE_URL`
  * `IRONCLAW_TEST_HTTP_REWRITE_MAP` routing
    `gmail.googleapis.com` at `mock_llm.py`'s new
    `/gmail/v1/users/me/{drafts,messages,...}` mock endpoints
    (so the gmail WASM tool's HTTP calls land deterministically)
  * `ENGINE_V2=true` and `AGENT_AUTO_APPROVE_TOOLS=true` so the
    chat-driven mission_create + fire flow runs without an
    administrative-approval prompt while the *authentication* gate
    that triggers the #3133 path remains active.

- `tests/e2e/mock_llm.py` adds Gmail HTTP mocks (drafts/send/list/get)
  plus `/__mock/gmail/{state,reset}` so tests can assert that the
  gmail tool actually fired against this mock (rather than no-oping
  or hitting the real Gmail API).

- `tests/e2e/scenarios/test_mission_gmail_3133.py` is rewritten to
  drive the full chat flow: chat asks for the mission, the live LLM
  emits `mission_create` + `mission_fire`, the child thread tries to
  use gmail, the auth gate fires and the mission transitions to
  Paused, the test completes OAuth via `/oauth/callback`, the
  half-2 wire (`bridge::resume_paused_missions_for_credential`)
  resumes the mission, and the re-fired child thread completes the
  draft against the gmail mock.

Known limitation documented in the test docstring: when the live
LLM picks Tier 1 (CodeAct / Python via Monty) for the child thread,
`tool_activate` is NOT exposed as a CodeAct callable — only Tier 0
structured tool calls expose it. Recent Sonnet defaults to Tier 1,
so the auth-gate path that half-1 added never fires in those traces
and the child thread surfaces a tool-error narration instead. The
test skips with a clear diagnostic in record mode if that happens
and the trace recording becomes a useful artifact for
investigating. Half-2 is verified end-to-end for the Tier 0 path
by the unit tests in
`crates/ironclaw_engine/src/runtime/mission.rs`. A Tier-1-resilient
fix needs the engine to auto-promote "credential missing" tool
errors to a `GatePaused` outcome so the Tier 1 path also writes
`paused_gate` — tracked as a follow-up on the same issue thread.

The fixture file is intentionally not committed in this PR — the
tooling works (recording produced 12 entries against NearAI's
Sonnet 4.5 in local testing) but a clean fixture requires the
Tier 1 gap to be addressed first. The committed `.gitkeep` keeps
the fixtures directory in tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(engine,bridge): inline-await for Authentication gates (Tier 0 + Tier 1)

PR #3157 wired Tier 0 + Tier 1 inline gate-await for Approval gates
but explicitly excluded Authentication (and External) — the trait
docstring on `GateController::pause` said "Restricted to Approval"
and both executor paths bailed with `RuntimeError("execution paused
by gate ...")` for non-Approval. That left the #3133 path partial:
a mission's CodeAct child thread that hit a Gmail OAuth gate
unwound to `ThreadOutcome::GatePaused`, but then the LLM in the
re-entered thread saw a stale "tool returned error" instead of a
clean retry against the now-present credential, and the mission
either re-fired on cron (the original ghost-fire) or narrated the
failure into FINAL().

This patch lets Authentication gates flow through the same inline-
await path as Approval and wires the OAuth callback to wake parked
VMs by credential name.

Engine (`crates/ironclaw_engine/`)
- `executor/scripting.rs`: `resolve_tool_future` and the
  `drive_inline_gate` retry loop now accept `ResumeKind::Authentication`
  alongside Approval. External keeps the legacy unwind path because
  its resolution payload (callback body) can't be handed back to a
  suspended call.
- `executor/structured.rs`: `execute_with_inline_gate_retry` mirrors
  the same expansion for Tier 0 batches.
- `gate/mod.rs`: drop the "Restricted to Approval" doc on
  `GateController::pause`. Document the new contract — the host
  controller is expected to deliver `GateResolution::Approved` once
  the credential lands in the secrets store, and the engine retries
  the action inline against the now-present secret.

Bridge (`src/bridge/`)
- `gate_controller.rs`: drop the non-Approval cancel in
  `BridgeGateController::pause`. Authentication is now handled the
  same way Approval is: register the pending gate, park on a
  oneshot, await resolution. Add a credential-name → request_ids
  index in `GateResolutions` and a new
  `deliver_for_credential(credential_name)` method that delivers
  `Approved` to every parked Authentication waiter for that
  credential.
- `router.rs`: new pub helper
  `resolve_inline_gates_for_credential(credential_name)` that walks
  `EngineState::gate_resolutions` and calls `deliver_for_credential`.
  Added a `gate_resolutions` field on `EngineState` (and threaded
  through every test fixture) so the OAuth-callback path can reach
  the resolutions registry without going through the controller's
  internals.

Wiring
- `channels/web/features/oauth/mod.rs` and `extensions/manager.rs`:
  the OAuth completion path now fires BOTH
  `resolve_inline_gates_for_credential` (wakes parked Tier 0 / Tier 1
  VMs for live foreground or mission child-thread executions) AND
  `resume_paused_missions_for_credential` (the half-2 mission-arm
  hook for missions whose child thread already unwound before OAuth
  completed). Best-effort dispatch — failures are logged inside the
  helpers and never block the OAuth landing page.

Tests
- All 27 existing `engine_v2_gate_integration` tests continue to pass
  unchanged. The Approval contract is unchanged and the
  Authentication contract was previously untestable end-to-end at
  the inline-await level (the old code surfaced as a `RuntimeError`).
- `runtime::mission::tests` (104 tests) all pass — the engine state
  machine is unchanged for the half-2 mission arm.

Open follow-up: a new gate_integration test that pins the
Tier 1 + Authentication + credential-write resume cycle. The
existing test harness already exercises the Tier 0 + Approval cycle;
adding the analogous Authentication test is straightforward but
out of scope for this commit (the LLM-driven trace recording
in the Playwright e2e test serves as the integration-tier
coverage once the chat path can drive it deterministically).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(engine): preserve legacy unwind for Authentication when controller cancels

Follow-up to the previous commit. The previous change made all
Authentication gates flow through `GateController::pause()`, but that
broke 4 existing tests (`gate_paused_authentication_carries_credential_name`,
`auth_resolution_retries_same_pending_action_without_second_pause`,
`approval_chains_directly_into_auth_for_install_flow`,
`install_auth_resume_followed_by_aliased_tool_call_completes_without_hanging`)
that exercise the legacy `ThreadOutcome::GatePaused` re-entry path.
Those tests use `CancellingGateController` (or a fixture variant
that cancels) which previously surfaced as the legacy unwind because
non-Approval never reached the controller; now that Authentication
does reach the controller, Cancelled was being treated as denial.

Fix: when the controller returns `Cancelled` for an Authentication
gate, fall through to the legacy unwind path (re-raise the original
`Err(GatePaused)` in Tier 0 / re-raise `RuntimeError("execution
paused by gate ...")` in Tier 1). Semantically: Cancelled-on-Auth
means "the controller can't resolve OAuth inline" — that's the
fallback condition for missions and non-inline-aware controllers,
where the engine should hand the gate up to the orchestrator so
the mission can transition to Paused and the half-2 mission-arm
auto-resume mechanism takes over after OAuth.

`Denied` and explicit user-driven `Cancelled` from a real production
controller (BridgeGateController via `/api/chat/gate/resolve`) still
fail the action. Approved retries inline as before.

Test updates
- `AutoApprovingGateController::pause` now only auto-approves
  Approval gates and explicitly returns Cancelled for Authentication
  so the 4 legacy-path tests continue to exercise the unwind.
- `approval_chains_directly_into_auth_for_install_flow` now expects
  the controller to observe BOTH the Approval pause AND the
  Authentication pause (which falls through to legacy unwind), since
  Authentication now goes through the controller before unwinding.
- New `authentication_gate_resolves_inline_via_controller` test pins
  the new inline-await contract for Authentication: a controller
  that returns Approved (mimicking the production BridgeGateController
  after a credential-write hook fires) makes the action retry inline
  and the thread complete in a single join, with both gate and
  retry events recorded.

All 28 `engine_v2_gate_integration` tests pass.
524 `ironclaw_engine` lib tests pass (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(engine): make tool_activate redundant for installed-but-unauthed tools

User's question: "do we need tool_activate at all?" Answer: not for the
case where a tool is installed but its credential is missing. The
engine should detect missing credentials at execute time and raise an
Authentication gate (which the inline-await machinery from the
previous commits then resolves after OAuth). The model can call the
tool directly — no preceding `tool_activate(name=...)` step.

Three cooperating changes ship that contract:

1. `Tool::required_credentials()` — new trait method, default empty.
   `WasmToolWrapper` overrides it by walking
   `capabilities.http.credentials` and returning every non-optional
   `secret_name`. Other tool kinds (built-in, MCP, etc.) inherit the
   empty default and are unaffected.

2. `AuthManager::check_action_auth` — extends the existing HTTP-tool
   credential preflight to non-HTTP tools. After the existing
   `is_http` early return, looks up the tool by name in the registry
   and consults `tool.required_credentials()`. For each declared
   credential it tries `resolve_secret_for_runtime`; missing ones
   produce `AuthCheckResult::MissingCredentials`. The bridge's
   `effect_adapter::execute_action` already converts that to
   `Err(EngineError::GatePaused { resume_kind: Authentication, ... })`,
   which the inline-await loops in `executor/structured.rs` and
   `executor/scripting.rs` (extended in the previous commits) park
   on `gate_controller.pause()`. After OAuth completes,
   `bridge::resolve_inline_gates_for_credential` delivers `Approved`
   and the action retries against the now-present secret.

3. `tool_surface::is_direct_ready` — `NeedsAuth` extension tools now
   stay on the callable surface alongside `Ready`. Pre-#3133 they
   were filtered out of `available_actions` and the model had to
   call `tool_activate` to move them to Ready. Post-fix they appear
   on the schema and in the CodeAct namespace; auth resolves at
   execute time. `NeedsSetup`, `Inactive`, `Latent`, and
   `AvailableNotInstalled` still fall through to capabilities-only
   because their resolution requires user-driven onboarding work
   that a credential-write hook can't supply.

Test updates
- `bridge::tool_surface::tests::assigns_surface_matrix_rows` —
  the "needs-auth extension direct action" case flips from
  `capabilities_only()` to `actions_only()`.
- `bridge::action_projector::tests::needs_auth_provider_tools_omitted_from_available_actions`
  renamed to `..._stay_in_available_actions` and expectation flipped.
  The `NeedsSetup` companion test stays unchanged — that path still
  needs explicit setup via `tool_activate` / extension UI.
- `bridge::effect_adapter::tests::available_actions_omit_installed_needs_auth_provider_action`
  renamed to `..._keep_installed_needs_auth_provider_action` and
  expectation flipped.
- 460 bridge tests pass, 28 gate integration tests pass, 524 engine
  tests pass, 19 auth tests pass.

Live test prompt updated
- The Playwright test no longer instructs the LLM to call
  `tool_activate` first. The new prompt says "call the gmail tool
  directly — the runtime handles authentication". This matches the
  new contract and gives the LLM the simplest path to the gmail call.

Known limitation surfaced by the live recording
- The live trace against NearAI's Sonnet 4.5 still skips with the
  test's "mission never reached Paused" diagnostic. Inspection
  shows gmail is NOT registered as a callable tool with the engine's
  tool registry until the WASM extension is activated — install +
  unauthed leaves the WASM module loaded but its tool definitions
  un-registered. So even with the surface change in place, the
  CodeAct namespace doesn't see `gmail` and the LLM can't invoke
  it. Closing this last gap requires registering WASM tools at
  install time (rather than at activation time) so the engine's
  preflight has a tool to look up and gate. Tracked as a follow-up
  on the same #3166 thread; the unit-test coverage of the
  auto-resume mechanism is unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(extensions): auto-register WASM tools with the engine registry on install

User asked: "can we remove tool_activate as a tool completely? and
instead register tools on install in ToolRegistry. Let's make the
live test work end to end."

This commit takes the first step: WASM tools are now registered
with the engine's `ToolRegistry` immediately at install time, not
gated behind a separate `tool_activate(name=...)` call. Combined
with the previous commits' surface change (`is_direct_ready(NeedsAuth)`)
and per-tool credential preflight (`AuthManager::check_action_auth`
reading `Tool::required_credentials()`), the model can now call an
installed-but-unauthed tool directly — the engine raises an
`Authentication` gate at execute time and the inline-await
machinery resumes the action after OAuth completes.

`install_wasm_tool_from_url_with_caps` now invokes
`activate_wasm_tool` immediately after the download lands. Any
activation failure is logged and downgraded to a warning rather
than unwinding the install — so a transient runtime error doesn't
break install. The user can still re-run `/api/extensions/{name}/activate`
in that fallback case.

Verified by direct diagnostic: a chat against a fresh ironclaw with
a freshly-installed (but unauthed) gmail extension now shows
`gmail` registered in `/api/extensions/tools` and classified as
`Inline` in the action projector (NeedsAuth status, available_actions
= true).

132 extension manager tests pass. 460 bridge tests pass. 524 engine
tests pass. 28 gate integration tests pass.

Remaining work to make `tool_activate` fully retire and the live
test pass end-to-end (not blocking this commit):

1. Remove `tool_activate` as a builtin tool, drop it from the
   AUTONOMOUS_TOOL_DENYLIST, drop the special-cased install-approval
   logic, and update the engine v2 system prompt's "Activatable
   Integrations" guidance. Mostly mechanical.

2. Make Tier 1 (CodeAct) mission child threads surface
   `ThreadOutcome::GatePaused` when an Authentication gate fires and
   the controller cancels (no PerExecutionContext registered for
   missions today). Currently the legacy fallback raises a Python
   `RuntimeError("execution paused by gate ...")` inside the script,
   which appears in stdout but isn't propagated to
   `CodeExecutionResult::need_approval`. The Tier 0 path already
   surfaces correctly via the gate_paused result_json. Closing this
   gap likely needs either: (a) a side-channel from `drive_inline_gate`
   into `execute_code`'s `need_approval` field, or (b) registering a
   mission-thread `PerExecutionContext` so the controller parks
   indefinitely on Authentication and the mission completes the
   action inline instead of unwinding. (b) is cleaner — missions
   then behave like the foreground chat path.

The live test stays in the tree as the recording artifact. The
diagnostic confirms the architectural change is correct (gmail is
classified Inline + registered) — the remaining gap is downstream
of that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(engine): surface Tier 1 Cancelled+Authentication gates as need_approval

Adds the missing wire from `drive_inline_gate`'s Cancelled+Authentication
fallback to `CodeExecutionResult::need_approval`, so mission child
threads running in Tier 1 (CodeAct) actually transition to Paused
instead of silently dropping the gate.

Before this commit:
  - Tier 1 mission child thread calls a tool that needs auth
  - Engine raises GatePaused{Authentication}
  - Inline-await calls `gate_controller.pause()` — no PerExecutionContext
    registered for missions, so the controller cancels
  - Cancelled+Authentication fallback in `drive_inline_gate` raises
    `RuntimeError("execution paused by gate ...")`
  - Script exits with the error in stdout
  - `CodeExecutionResult::need_approval` was always None
  - Orchestrator sees the script error but no `pending_gate` field
  - Mission stays Active, cron keeps re-firing — the original #3133
    ghost-fire shape

After this commit:
  - `drive_inline_gate` writes the original `ThreadOutcome::GatePaused`
    to a tokio `task_local!` (`PENDING_GATE_STASH`) before raising the
    RuntimeError
  - `execute_code_with_skills_inner` runs inside
    `PENDING_GATE_STASH.scope(RefCell::new(None), ...)` so the stash
    is scoped per execution
  - The runtime-error exit path reads the stash and threads it into
    `CodeExecutionResult::need_approval`
  - The orchestrator's `__run_code__` wrapper already converts
    `need_approval` into a `pending_gate` field on the script result
    (line ~1061 of orchestrator.rs)
  - The Python orchestrator reads `pending_gate`, transitions the
    thread to Waiting, and returns `outcome=gate_paused`
  - Rust converts that to `ThreadOutcome::GatePaused`
  - `process_mission_outcome_and_notify` sees GatePaused → mission
    Paused → `paused_gate` set → user sees notification → user does
    OAuth → half-2 mission arm auto-resumes the mission

This is the missing piece to make Tier 1 missions safe under #3133.
For Tier 0, the legacy unwind already produces a `gate_paused`
result_json directly. For foreground threads with a wired
BridgeGateController, inline-await parks indefinitely on Authentication
(the new contract from earlier commits) and the Cancelled fallback
never fires.

524 engine tests pass. 28 gate integration tests pass. 460 bridge
tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(bridge): tone down preflight-auth + gate-cancel logs to debug

Cleanup pass after the broader #3133 wiring audit. Both lines were
temporarily promoted to info! during debugging; for production they
should sit at debug! to avoid log noise on the happy path.

No behavior change.

* fix: complete Tier 1 mission auto-resume + WASM tool test rewrites for #3133

Five small fixes that together let the mission auto-resume (#3133 / #3166)
flow run end-to-end through Tier 1 (CodeAct/Monty) child threads:

- engine: extract `take_pending_gate_stash()` and apply it at all three
  Tier 1 script-error exit paths in `execute_code_with_skills_inner`. The
  previous fix only covered one exit; the others silently swallowed
  Cancelled+Authentication gates, leaving the mission Active when
  Monty raised the gate from a non-progress error path.

- engine: deterministic sort in the trait-default
  `list_missions_with_shared` plus the in-memory `store_adapter::list_missions`
  / `list_all_missions`. Eliminates HashMap-iteration-order leakage into
  `mission_list` tool results so the LLM and replay tests see a stable
  shape.

- bridge gate_controller: replace stale "no per-execution context →
  cancel" comment with the real reason — mission/background threads
  intentionally fall through to the legacy `ThreadOutcome::GatePaused`
  unwind so `process_mission_outcome_and_notify` (#3133 half-1) flips
  the mission to Paused, and `resume_paused_for_credential` (half-2)
  resumes after OAuth.

- WASM tool wrapper: in test/debug builds, after credential injection
  apply `IRONCLAW_TEST_HTTP_REWRITE_MAP` to outbound URLs so live-test
  fixtures can route `gmail.googleapis.com` at the mock_llm endpoint
  the same way WASM channels already do. Also implement
  `Tool::required_credentials()` from the WASM capability manifest so
  the engine can preflight credential checks without invoking the tool.

- WASM tool http_security: when `IRONCLAW_TEST_HTTP_REWRITE_MAP` is
  set in a debug build and the rewritten host resolves to loopback,
  allow it through the SSRF guard. Production paths (no env var, or
  release builds) keep rejecting private/internal IPs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(e2e): land live fixture for #3133 + harden replay canonicalization

Records the full LLM trace for the mission auto-resume regression and
makes replay deterministic across re-recordings on different machines:

- live_llm_proxy: drop skill bodies from the canonical hash. Skills
  are local-machine state (the user's `~/.ironclaw/skills` set varies
  between record and replay) so hashing the body forces a fresh
  recording for every checkout. The skill *set* still influences the
  LLM's actual response — replay just doesn't re-validate it. Also
  collapse `[SKILL] skill:NAME ...` and `### [SKILL] skill:NAME ...`
  forms via a single regex so user-message and system-prompt skill
  blocks both normalize.

- live_llm_proxy: log every chat_completions request and any
  upstream 4xx/5xx body to stderr. Diagnosable failures during a
  re-record without re-running the harness from scratch.

- conftest + live_harness: when `IRONCLAW_E2E_STDERR_LOG` /
  `IRONCLAW_LIVE_PROXY_STDERR_LOG` are set, redirect ironclaw and
  proxy stderr to a file so the LLM-call sequence and any rust-side
  retry warnings are durable across teardown. Defaults are unchanged
  (PIPE) so CI behavior matches today.

- fixtures/live/test_mission_gmail_draft_3133.json: fresh recording
  produced by `IRONCLAW_LIVE_TEST=1 IRONCLAW_LIVE_LLM_MODEL=Qwen/Qwen3.5-122B-A10B
  pytest scenarios/test_mission_gmail_3133.py`. Replay passes
  deterministically (25.87s on this box) and exercises the full
  Paused → OAuth → Active → gmail-draft loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor: remove tool_activate; rely on auth-preflight for installed-but-unauthed tools

`tool_activate` was the engine-v2 enablement entry point for installed
provider tools that needed OAuth before the model could call them.
Post-#3133 the same flow is covered by the auth preflight in
`AuthManager::check_action_auth`: any tool with declared
`required_credentials()` raises an `Authentication` gate at execute
time when the credential is missing, the inline-await machinery parks
the VM, and the OAuth callback resumes the action. The separate
`tool_activate` step is now redundant.

What's removed:

- `ToolActivateTool` struct + Tool impl (`src/tools/builtin/extension_tools.rs`)
- Registration sites: `ToolRegistry::register_extension_tools`,
  `effect_adapter` test rigs, action_projector test surface
- The `tool_activate_requires_install_approval` adapter method,
  `matching_extension_requires_install_approval` helper, and three
  approval-gate tests that only fired on the `tool_activate` lookup name
- The `tool_activate` entry from the autonomous tool denylist
  (`src/tools/autonomy.rs`, `crates/ironclaw_engine/src/gate/tool_tier.rs`),
  `seeded_default_permission`, the action-resolver alias map
  (`src/auth/extension.rs`), the schema validator, and the autonomous
  whitelist in `src/tools/registry.rs`
- The "Activatable Integrations" prompt template's
  `tool_activate(name=...)` instruction; replaced with text telling the
  model these integrations need user setup through the IronClaw UI
- `tests/e2e/scenarios/test_v2_tool_activate_surface.py` and
  `tests/e2e_auth_gate_traces.rs` plus its 5 LLM trace fixtures —
  these tested the old tool-output-detection contract that the auth
  preflight obsoletes; equivalent coverage lives in
  `bridge::effect_adapter::tests::preflight_gate_blocks_missing_credential`,
  the engine v2 mission unit tests, and the live mission gmail e2e

What changed in the prompt:

- `is_activatable_integration` now excludes `NeedsAuth` — those tools are
  direct-callable through the regular action inventory. `NeedsSetup`,
  `Inactive`, `Latent`, and `AvailableNotInstalled` still surface
  separately because they need user-driven setup the model can't do
  itself.

Live fixture re-hashed: stripped `tool_activate` from each entry's
`request_canonical.tools` and recomputed `request_hash` so replay
matches the new (1-shorter) tool list. Replay test still passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: cargo fmt across the branch

Run rustfmt on the engine and bridge files touched by the earlier
mission auto-resume + tool_activate-removal commits. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(missions, gates): address PR #3366 review — atomic resume + per-user wakeup scoping

Five correctness fixes from the PR review on #3366:

1. **Per-user credential wakeup (Copilot)**
   `GateResolutions.by_credential` now keys by `(user_id,
   credential_name)`. Previously a credential write under one account
   could deliver `Approved` to parked Authentication gates from a
   different account that happened to share the same credential name.
   `register_credential`, `deliver_for_credential`, and the public
   `bridge::resolve_inline_gates_for_credential` helper now all carry
   `user_id`. Call sites: OAuth callback handler, WASM OAuth completion
   in extensions manager.

2. **Atomic `resume_paused_for_request_id` (serrrfirat HIGH)**
   The handler now reloads the mission and re-checks
   `paused_gate.gate_request_id` in the same load+save round-trip,
   instead of relying on the snapshot it scanned for. Prevents a race
   where a mission re-paused on a different gate between the snapshot
   and the resume could be transitioned to `Active` (silently clearing
   the new gate) because `resume_mission` accepts both `Paused` and
   `Failed`. Returns `None` when the live gate no longer matches.

3. **Shared mission auto-resume runs under requesting user
   (serrrfirat HIGH)**
   For shared missions (`mission.user_id == "__shared__"`),
   `resume_paused_for_credential` and `resume_paused_for_request_id`
   now use `user_id` (the user whose credential / resolution unblocked
   the gate) for `fire_mission`, while keeping
   `mission.user_id` for the `resume_mission` ownership check.
   Previously the post-resume fire ran under `__shared__`, which has
   no per-user secret / project scope.

4. **Mission-only gate cancellation (serrrfirat MEDIUM)**
   `chat_gate_resolve_handler` no longer requires `thread_id` on the
   `Cancelled` arm. Mission-only gates have no foreground `thread_id`
   in the resolution payload; the mission auto-resume path now
   carries the `Cancelled` outcome to the mission state machine even
   without a foreground submission. Foreground gates that do supply
   `thread_id` still dispatch the structured cancellation as before.

5. **CredentialProvided now triggers mission auto-resume
   (serrrfirat MEDIUM)**
   `chat_gate_resolve_handler` now fires `mission_outcome =
   Approved` for `CredentialProvided` submissions, not just
   `Approved`. Previously, missions paused on a non-OAuth credential
   (e.g. user submits a Telegram bot token through the gate UI) would
   only auto-resume on a subsequent OAuth callback. The new mission
   helpers' atomic gate-id check makes the double-fire (this hook +
   the OAuth-callback path) idempotent.

Plus six doc fixes: stale references to
`bridge::resume_paused_missions_for_credential` in inline-await
docstrings (`gate/mod.rs`, `executor/structured.rs`,
`executor/scripting.rs`) now correctly point at
`bridge::resolve_inline_gates_for_credential`. The router doc no
longer claims a `secrets_store::create/update` hook that doesn't
exist; updated to name the actual call sites and document the
manual-credential follow-up.

Test fixtures in `bridge::router::tests` (4 EngineState constructors)
now share a single `Arc<GateResolutions>` between `gate_controller`
and `gate_resolutions` so test paths exercising
`resolve_inline_gates_for_credential` actually reach the parked
waiters that `BridgeGateController::pause` registered.

All 5623 lib tests pass; replay e2e still green at ~28s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(e2e): correct return type on live_llm_proxy SSE helpers

Copilot review: `_emit_streamed_response` was annotated `web.StreamResponse`
but the underlying `_send_sse` helper buffers the payload and returns a
`web.Response`. Aligned the chain (`_emit_streamed_response`,
`_send_sse_payload`, `_send_sse_lines`, `_send_sse`) on the actual return
type and removed the unreachable `_start`/`started` block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: bump to retrigger workflows on PR #3366

* fix(missions, wasm, e2e): address remaining PR #3366 review feedback

- chat_gate_resolve_handler: validate gate request_id once up front
  so every arm (including Approved/Denied) returns a uniform 400 on
  malformed UUIDs and the mission auto-resume hook isn't silently
  skipped.
- extensions::manager::install_wasm_tool_from_url_with_caps: reflect
  activation outcome in InstallResult.message — no longer claims
  "installed and ready" when activate_wasm_tool fails.
- tools::wasm::http_security: drop the is_unspecified() branch from
  the IRONCLAW_TEST_HTTP_REWRITE_MAP escape hatch; the rewrite helper
  only emits loopback IPs, so allowing 0.0.0.0/:: just weakened the
  SSRF guard in debug/test builds.
- tests/e2e/live_harness.py: remove unused subprocess import and
  unused _find_free_port helper.
- tests/e2e/live_llm_proxy.py: docstring on _normalize_skills_block
  said "sorted name list" while the implementation drops the entire
  block to a `[SKILLS]` placeholder — corrected to match behavior.

[skip-regression-check] — review-feedback fixes, no behavioral
regression to add a test for; existing chat-gate resolve coverage
exercises the validated path, and the SSRF guard fix tightens an
unreachable branch.

* fix(gates, wasm, e2e): address final PR #3366 review and unblock CI

- chat/gate-resolve cancel: when the client omits `thread_id` for a
  foreground inline-await gate, look up the owning thread from
  `PendingGateStore` instead of skipping dispatch — otherwise the
  parked VM is stranded even though the API reports success.
  Adds `PendingGateStore::peek_by_request_id` (user-scoped, read-only)
  and `bridge::get_pending_gate_by_request_id`.
- tools/wasm rewrite validator: replace the hardcoded
  `http://...`-prefix match with URL parsing that mirrors the
  channel-side `is_loopback_test_rewrite_base`, accepting any
  http/https loopback target. Also strip IPv6 brackets in both
  helpers so `[::1]` actually validates.
- tests/e2e/pyproject.toml: scope setuptools auto-discovery to
  `scenarios*` so the new `fixtures/` (live-LLM record/replay JSON)
  no longer trips the "multiple top-level packages" error and
  `pip install -e .` works in CI.

Regression coverage: store-level peek_by_request_id ownership +
expiry tests; loopback-rewrite parity + https-loopback acceptance
tests in tools/wasm/wrapper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: channel/web Web gateway channel size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug Bash 4/17] Routine creation fails with "5 consecutive code errors"

3 participants