fix(bridge): coerce engine action params per schema (#3132)#3197
Conversation
LLMs routinely send numeric tool params as JSON strings (`"120"` instead of `120`). Host tools handle this — `ToolDispatcher::dispatch` runs `prepare_tool_params` which coerces against the tool's JSON Schema. But engine actions (`mission_*`, `routine_*` aliases, `tool_info`) are intercepted in `effect_adapter::execute_action_internal` *before* that coercion ever runs, so `mission_create` rejected `cooldown_secs="120"` with `'cooldown_secs' must be an integer, got "120"`. Lift the schema-guided coercion to the engine-action dispatch boundary so both pipelines share the same input pre-amble. Schema sources, in order: orchestrator-populated `available_actions_snapshot`, bridge-known mission action defs, host tool registry. Idempotent for already-typed inputs, so the existing host-tool path sees unchanged shape. The `extract_guardrails` / `strict_u64` helper stays strict as defense in depth — if a future code path bypasses the new pre-amble, it still fails loudly rather than silently dropping the value. Tests: - mission_create_string_guardrails_coerced_via_execute_action — drives execute_action with all four guardrail params as strings; asserts values persist correctly to the mission row. - mission_update_string_guardrails_coerced_via_execute_action. - mission_create_non_coercible_string_guardrail_returns_error — `"abc"` still surfaces a clean error. - extract_guardrails_rejects_string_typed_integers (existing, doc comment updated to reflect defense-in-depth role).
There was a problem hiding this comment.
Code Review
This pull request implements schema-guided parameter coercion for engine actions and host tools, allowing the bridge to correctly type stringified scalars from LLMs. The changes involve resolving schemas from multiple sources and applying coercion via a newly exported utility. Review feedback suggests refactoring the coercion logic in effect_adapter.rs to improve readability and avoid variable shadowing by using an if/else if structure instead of nested matches.
| let action_schema = context | ||
| .available_actions_snapshot | ||
| .as_ref() | ||
| .and_then(|snapshot| { | ||
| ActionDiscovery::resolve(snapshot.as_ref(), canonical_action_name) | ||
| .map(|action| action.parameters_schema.clone()) | ||
| }) | ||
| .or_else(|| engine_action_schema(canonical_action_name)); | ||
| let action_schema = match action_schema { | ||
| Some(schema) => Some(schema), | ||
| None => self | ||
| .tools | ||
| .get(&lookup_name) | ||
| .await | ||
| .map(|tool| tool.parameters_schema()), | ||
| }; | ||
| let parameters = match action_schema { | ||
| Some(schema) => crate::tools::prepare_params_for_schema(¶meters, &schema), | ||
| None => parameters, | ||
| }; |
There was a problem hiding this comment.
The logic for finding the action schema and coercing parameters is a bit complex and could be simplified for better readability. The current implementation uses shadowing and a match statement that could be expressed more clearly. Consider refactoring this block to use a single if/else if chain. This would make the fallback logic for finding the schema more explicit and easier to follow, separating the checks for distinct conditions. This avoids shadowing action_schema and consolidates the coercion logic.
let action_schema = context
.available_actions_snapshot
.as_ref()
.and_then(|snapshot| {
ActionDiscovery::resolve(snapshot.as_ref(), canonical_action_name)
.map(|action| action.parameters_schema.clone())
})
.or_else(|| engine_action_schema(canonical_action_name));
let parameters = if let Some(schema) = action_schema {
crate::tools::prepare_params_for_schema(¶meters, &schema)
} else if let Some(tool) = self.tools.get(&lookup_name).await {
let schema = tool.parameters_schema();
crate::tools::prepare_params_for_schema(¶meters, &schema)
} else {
parameters
};References
- Separate checks for distinct conditions to improve code clarity and robustness.
There was a problem hiding this comment.
Applied in 023a4c5 — collapsed the double-match into the if/else if chain you suggested. Used discovery_schema() on both branches to match prepare_tool_params (also addresses the sibling Copilot comment).
There was a problem hiding this comment.
Pull request overview
This PR fixes engine-v2 action dispatch so schema-guided parameter coercion happens before bridge-handled engine actions execute, aligning mission/routine action handling more closely with the existing host-tool normalization pipeline. In the broader codebase, this targets the bridge layer where engine-native actions bypass the standard tool dispatcher.
Changes:
- Exported shared schema-coercion helper so bridge code can normalize params directly from a JSON Schema.
- Added pre-dispatch schema-based coercion in
EffectBridgeAdapter::execute_action_internal, sourcing schemas from action snapshots, engine mission action defs, or the tool registry. - Reworked regression tests to assert stringified mission guardrail integers are now persisted correctly and that non-coercible strings still fail cleanly.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/tools/mod.rs |
Re-exports the lower-level schema coercion helper for bridge use. |
src/bridge/effect_adapter.rs |
Adds early action parameter coercion, helper schema lookup, updated comments, and mission guardrail regression tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .tools | ||
| .get(&lookup_name) | ||
| .await | ||
| .map(|tool| tool.parameters_schema()), |
There was a problem hiding this comment.
Already addressed in commit e492f59 — both the snapshot/engine_action and host-tool branches now use discovery_schema() (the structural follow-up in 023a4c5 keeps that). The intent matches prepare_tool_params's historical behavior, including for tools where discovery_schema() is richer than parameters_schema() (compacted WASM schemas, routine compat aliases).
| /// Engine actions (`mission_*`, `routine_*` aliases, `tool_info`) are not | ||
| /// in the host `ToolRegistry` — they are handled directly inside the | ||
| /// bridge. This helper provides the canonical schema for those actions so | ||
| /// they get the same schema-guided parameter coercion that registered | ||
| /// tools get via `prepare_tool_params`. The orchestrator-populated | ||
| /// `available_actions_snapshot` takes precedence; this is the fallback | ||
| /// for callers (and tests) that do not populate it. |
There was a problem hiding this comment.
Doc rewritten in 023a4c5 — only mission_* is actually absent from the host registry. routine_* (legacy v1 host Tools, intercepted by the alias path before they execute) and tool_info (a v1/v2 host tool) are present and reach execute_action_internal's registry branch directly. The new doc explains the asymmetry honestly and clarifies that this helper is a defense-in-depth fallback for the only category that doesn't have a host Tool.
… lift Follow-up to the schema-guided coercion lift in execute_action_internal. Two cleanups now that the upstream pre-amble runs for every action: 1. Use `discovery_schema()` consistently for both the ActionDef and host-tool paths (matches what `prepare_tool_params` was doing all along — for tools whose `parameters_schema()` is permissive but `discovery_schema()` is strict, this is the schema you actually want for coercion). 2. Drop the explicit `prepare_tool_params` call in the sandbox branch (was lines 1572-1581). It existed to keep the sandbox-path validator and `maybe_intercept` in sync with the host path's coercion; now that `parameters` is coerced once at the top of `execute_action_internal`, both downstream callees see the same shape without the duplicate pass. The remaining `prepare_tool_params` call inside `execute_tool_with_safety` (`tools/execute.rs:40`) stays as-is — it serves both engine v2 (now idempotent) and v1 callers like `execute_chat_tool_standalone` that don't pre-coerce. Lifting that one out is a v1-side refactor outside #3132's scope. Coverage from the prior commit's tests is unchanged: the engine v2 mission_create / mission_update coercion tests exercise the same upstream code path, and engine_v2_sandbox_integration tests exercise the sandbox branch with already-coerced params. [skip-regression-check] No new behavior — this is a code-deletion refactor of the duplicate coercion call site. The behavior is covered by the prior commit's tests; this commit just removes the now-dead parallel path.
|
Pushed
The Verification: |
Two review-driven cleanups (PR #3197): 1. Schema lookup chain: collapse the double-`match`-with-shadowing into the if/else if chain Gemini suggested, with `discovery_schema()` on both branches (Copilot's earlier point that the host-tool fallback needed `discovery_schema`, not `parameters_schema`, was already addressed in commit e492f59). Three branches, one `prepare_params_for_schema` call per branch, no shadowing. 2. Doc comment on `engine_action_schema`: previous wording claimed `mission_*`, `routine_*`, and `tool_info` are "not in the host ToolRegistry" — only `mission_*` is actually absent. `routine_*` tools are registered as legacy v1 host Tools (intercepted by the alias path in v2 before they execute) and `tool_info` is a v1/v2 host tool. Both reach `execute_action_internal`'s registry branch directly. Reworded the doc to call out the asymmetry honestly. [skip-regression-check] No behavior change — readability refactor and doc correction. Coverage from existing tests unchanged.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .map(|action| action.discovery_schema().clone()) | ||
| }) | ||
| .or_else(|| engine_action_schema(canonical_action_name)); | ||
| let parameters = if let Some(schema) = action_schema { | ||
| crate::tools::prepare_params_for_schema(¶meters, &schema) | ||
| } else if let Some(tool) = self.tools.get(&lookup_name).await { | ||
| crate::tools::prepare_params_for_schema(¶meters, &tool.discovery_schema()) |
| async fn mission_create_string_guardrails_coerced_via_execute_action() { | ||
| let (adapter, _store, dyn_store) = | ||
| make_adapter_with_missions_and_store(Arc::new(ToolRegistry::new())).await; |
…env (#3205) `test_refresh_access_token_direct_includes_stored_client_secret` did not hold `lock_env()` and did not clear `IRONCLAW_OAUTH_EXCHANGE_URL` / `IRONCLAW_OAUTH_PROXY_AUTH_TOKEN`. When `test_refresh_access_token_uses_proxy_when_configured` ran in parallel and left those vars set inside its lock-protected window, the direct test routed through the proxy URL to the proxy test's mock server, double-counting requests on one mock and starving the other. Both tests then failed their `requests.len() == 1` assertion (one saw zero, one saw two), surfacing as flake on the merge queue head where test scheduling differs from PR CI. Pattern matches the already-correct `test_refresh_access_token_serializes_concurrent_refreshes`, which acquires the lock and clears the same two env vars. Blocks #3197 from landing through the merge queue. [skip-regression-check] test-only change to fix a flaky test
…3234) The v2-engine E2E group still listed test_v2_kernel_auth_preflight.py, but that file was removed in #2868 (engine-v2: callable-only available actions) and replaced with test_v2_tool_activate_surface.py for the new tool_activate / Activatable Integrations contract. The Web E2E Full job is skipped on PR-level CI but runs in the merge queue, so the bad path filter dequeued #3197 and #3203 with "file or directory not found: test_v2_kernel_auth_preflight.py".
`test_auth_required_sse_without_duplicate_response` was failing in the merge queue on every PR (most recently bouncing #3197 and #3203 from the queue) because the `auth_sse_server` fixture never set `LLM_API_KEY` in the spawned ironclaw env. After #2572 added a missing- API-key check to the openai_compatible config validator (Apr 22), ironclaw rejected the env-supplied openai_compatible config, fell back to the NearAI default, hit "missing session token", and failed the turn before the github skill could even fire its 401. The chat thus reached `state: Failed` with no tool calls and no `onboarding_state/auth_required` event — which is exactly what the test asserted on, hence the consistent failure. Adding `LLM_API_KEY=mock-api-key` matches the value already used in every other e2e fixture (auth_matrix, conftest's ironclaw_server, v2_engine, etc.) and unblocks the assertion. Local run: PASSED in 8s.
…ange (#3235) * ci(e2e): replace deleted preflight test with tool_activate surface The v2-engine E2E group still listed test_v2_kernel_auth_preflight.py, but that file was removed in #2868 (engine-v2: callable-only available actions) and replaced with test_v2_tool_activate_surface.py for the new tool_activate / Activatable Integrations contract. The Web E2E Full job is skipped on PR-level CI but runs in the merge queue, so the bad path filter dequeued #3197 and #3203 with "file or directory not found: test_v2_kernel_auth_preflight.py". * test(e2e): unblock Live Canary auth lanes after engine-v2 contract change The Live Canary "Auth Smoke", "Auth Full", and "Auth Live Seeded" jobs have failed every scheduled run since 2026-05-01 (when the canary cut over to main). Three tests in test_v2_auth_oauth_matrix.py drive the failures, all rooted in the engine-v2 callable-only contract from #2868 that didn't exist when these tests were written. ## What was broken `test_mcp_same_server_multi_user_via_browser` After OAuth completes, sending "check mock mcp search" through each user's browser opens an `approval` pending_gate on the first MCP tool call (engine v2 default). The browser fixture has no auto-approve UI, so the chat sat in `pending_gate` for the full 5-min Playwright timeout — `expected_text_contains="Mock MCP search result"` could never match because the assistant bubble never received any text. `test_wasm_tool_oauth_refresh_on_demand` Same shape: gmail call gates on `approval` before reaching the http credential-injection layer that performs the OAuth refresh. Without approving, refresh_count never went above 0, so the test failed with "Timed out waiting for OAuth refresh request". `test_wasm_tool_first_chat_auth_attempt_emits_auth_url` Tested OLD engine-v2 behavior — that an LLM-emitted call to a not-yet- authed extension would surface a `gate_required` Authentication event with an auth URL. After #2868, the engine returns "action 'gmail' is not callable in this execution context" instead, and `tool_activate` became the model-facing enablement path. The mock LLM is canned to emit tool calls directly, so this scenario can't be reproduced from a scripted LLM until the canned response is updated. ## Fixes - `_wait_for_tool_call`: accept a `token` kwarg so multi-user tests can poll/approve through a per-user identity. Backwards-compatible. - `test_mcp_same_server_multi_user_via_browser`: drive approval through the per-user API while waiting for the tool to land. Drop the broken `expected_text_contains` predicate and the tied "Mock MCP search result" text assertions; the bearer-token isolation assertion (what this test actually exists to prove) is retained and unaffected. - `test_wasm_tool_oauth_refresh_on_demand`: insert a `_wait_for_tool_call` approval step between `_send_chat` and `_wait_for_refresh_request` so the http credential layer actually runs. - `test_wasm_tool_first_chat_auth_attempt_emits_auth_url`: marked xfail with an inline reason pointing at #2868 and the replacement coverage (`test_v2_tool_activate_surface.py`, `test_settings_first_gmail_auth_then_chat_runs`). - Drop the now-unused `send_chat_and_wait_for_terminal_message` import. ## conftest fix `ironclaw_server` now sets `SECRETS_MASTER_KEY` in the spawned env. On macOS without it, `auto_generate_and_persist` blocks on a Keychain authorization prompt that no one's home to click, so `wait_for_ready` times out at 60s and the fixture kills the process with SIGKILL — making any session-scoped browser test impossible to run locally. On Linux, the keychain backend errors fast and the auto-generate fallback writes to `.env`, so CI was unaffected. Setting the key explicitly matches the pattern already used in `auth_matrix_server`, `test_v2_engine_auth_cancel`, `test_v2_tool_activate_surface`, etc. ## Verification Local repro confirmed each failure mode (HTTP-only repro for the non-browser tests, server-side log inspection for the multi-user test). Reproduced the exact pending_gate=approval pattern, fixed it, verified the assertion semantics still hold: ``` $ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py::test_wasm_tool_oauth_refresh_on_demand PASSED in 6.74s $ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py::test_wasm_tool_first_chat_auth_attempt_emits_auth_url XFAIL in 93s $ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py -v --timeout=120 12 passed, 2 skipped, 3 xfailed (browser tests errored locally; they'll run cleanly in CI) ``` The browser-driven `test_mcp_same_server_multi_user_via_browser` couldn't be exercised locally (chromium can't launch under this shell sandbox), but the API + auto-approve flow it now relies on is exercised by an HTTP-equivalent repro and matches the pattern used in `test_settings_first_gmail_auth_then_chat_runs`. * test(e2e): set LLM_API_KEY in auth_sse_server fixture `test_auth_required_sse_without_duplicate_response` was failing in the merge queue on every PR (most recently bouncing #3197 and #3203 from the queue) because the `auth_sse_server` fixture never set `LLM_API_KEY` in the spawned ironclaw env. After #2572 added a missing- API-key check to the openai_compatible config validator (Apr 22), ironclaw rejected the env-supplied openai_compatible config, fell back to the NearAI default, hit "missing session token", and failed the turn before the github skill could even fire its 401. The chat thus reached `state: Failed` with no tool calls and no `onboarding_state/auth_required` event — which is exactly what the test asserted on, hence the consistent failure. Adding `LLM_API_KEY=mock-api-key` matches the value already used in every other e2e fixture (auth_matrix, conftest's ironclaw_server, v2_engine, etc.) and unblocks the assertion. Local run: PASSED in 8s. * fix(gateway): suppress duplicate assistant bubble after streamed response [skip-regression-check] The SSE `response` handler unconditionally called addMessage('assistant', data.content) even when stream_chunks had already populated and finalized a bubble for the same response. This stayed invisible in the common case but surfaced as a hard test failure under the path test_switching_back_preserves_in_progress_turn: 1. Send "What is 2+2?" on thread A — stream chunks start filling an assistant bubble with "data-streaming". 2. Switch to thread B mid-stream — container clears (history reload). 3. Switch back to thread A — history rehydration shows the in-progress turn with no response yet, so 0 assistant bubbles in DOM. 4. Stream chunks continue to fire for A — appendToLastAssistant creates a new bubble and accumulates the response into it. 5. response event fires — flushes any remaining buffer, removes the data-streaming flag (good) — then addMessage('assistant', content) creates a SECOND identical bubble. Result: locator(".message.assistant").filter(has_text="4") matches two elements, Playwright strict mode rejects the wait_for, the test fails. Outside the test, two identical bubbles render to the user. Fix: only call addMessage in the response handler when there was no in-flight streaming bubble. If one existed, the streamed content is already correct (chunks accumulate `data.content` verbatim) and the data-streaming flag has just been cleared. Non-streaming responses (no chunks fired) still take the addMessage branch. Regression coverage: tests/e2e/scenarios/test_message_persistence.py:: test_switching_back_preserves_in_progress_turn already reproduces this exact scenario and was failing in the merge queue. With this fix it passes; skip-regression-check used because the existing E2E test is the regression test, and the gateway doesn't have a JS unit test harness for SSE handler state. * fix(gateway): dedupe history-rendered SSE responses * test(e2e): set mock LLM API key in standalone fixtures * fix(e2e): make v2 approval tests deterministic * test(e2e): stabilize duplicate skill install assertion * test(e2e): assert duplicate install stays ungated * fix(skills): skip approval for disk-installed duplicates * fix(v2): honor no-op skill installs without approval * test(e2e): wait for pending send marker to clear --------- Co-authored-by: Firat Sertgoz <f@nuff.tech>
Summary
mission_createandmission_updaterejectedcooldown_secs="120"with'cooldown_secs' must be an integer, got "120"when the LLM passed integer guardrails as JSON strings (issue Mission creation fails - error': '\'cooldown_secs\' must be an integer, got "120"' #3132).ToolDispatcher::dispatch→prepare_tool_params, but engine actions (mission_*,routine_*aliases,tool_info) are intercepted ineffect_adapter::execute_action_internalbefore that coercion runs.available_actions_snapshot, bridge-known mission action defs (mission_capability_actions()), host tool registry. Idempotent — host-tool path sees unchanged shape sinceprepare_tool_paramsruns again downstream.extract_guardrails/strict_u64stay strict as defense in depth: if a future code path bypasses the new pre-amble, it still fails loudly rather than silently dropping the value (the bug shape from before Unlock REPL stdin when auth is required #2630).Why this approach over making engine actions into Tools
Engine actions are engine-runtime operations on
MissionManager/Store/LeaseManager. They takeThreadExecutionContext(step_id, current_call_id, available_actions_snapshot, user_timezone, thread_goal), notJobContext; they are gated byCapabilityLease+EffectTyperather thanrequires_approval/rate-limit; and they belong to the engine crate, which has no host dependency. Engine v2's design direction is the opposite —Capability/ActionDefreplacingTool. Forcing engine actions throughToolwould re-introduce a circular dependency, force the wrong context type, and double up on gating semantics. The right fix is to share the parts of the pipeline that should be shared (coercion, validation, redaction, sanitization), not to merge the abstractions.Test plan
cargo fmt --checkcargo clippy --all --benches --tests --examples --all-features— zero warningscargo test --lib— 5598 passed, 0 failedmission_create_string_guardrails_coerced_via_execute_action— drivesexecute_actionwith all four guardrail params (cooldown_secs,max_concurrent,dedup_window_secs,max_threads_per_day) as strings; asserts the values persist correctly to the mission row in the backing store.mission_update_string_guardrails_coerced_via_execute_action— same coverage for the update path.mission_create_non_coercible_string_guardrail_returns_error—cooldown_secs="abc"still surfaces a clean error.extract_guardrails_rejects_string_typed_integerscontinues to pass; doc comment updated to reflect its defense-in-depth role.Resolves #3132.