feat(gateway/ws): tool-approval back-channel via WsApprovalChannel by singlerider · Pull Request #6387 · zeroclaw-labs/zeroclaw

singlerider · 2026-05-05T04:14:50Z

Summary

The agent's tool loop already iterates channel_handles.ask_user looking for a Channel that can answer ChannelApprovalRequest. The gateway WS path constructed an Agent but never registered itself as such a channel, so any supervised tool call routed through /ws/chat would auto-deny with no operator surface.

Two-part fix:

New TurnEvent::ApprovalRequest { request_id, tool_name, arguments_summary, timeout_secs } variant on the runtime side.
WsApprovalChannel in the gateway. When the agent calls request_approval, the channel mints a request_id, emits the event over a connection-level mpsc, and parks on a oneshot waiter. The WS forward task drains the mpsc onto the wire; inbound approval_response frames pop the waiter from the shared pending map. Timeouts and disconnects auto-deny so the agent never sees Ok(None) and never falls back to the implicit no-op path.

Wire format:

Server -> Client: { "type": "approval_request",
                    "request_id": "<uuid>", "tool": ...,
                    "arguments_summary": ...,
                    "timeout_secs": 120 }
Client -> Server: { "type": "approval_response",
                    "request_id": "<uuid>",
                    "decision": "approve" | "deny" | "always" }

arguments_summary is the redacted summary the runtime synthesises via zeroclaw_runtime::approval::summarize_args; raw args (including secret-bearing fields) never reach the wire.

The ACP channel keeps its own request_choice path; ApprovalRequest events are dropped on the ACP transport since approvals there flow through session/request_permission.

Closes #6207

Validation Evidence

cargo fmt --all -- --check
cargo clippy --workspace --exclude zeroclaw-desktop --all-targets --features ci-all -- -D warnings
cargo test --package zeroclaw-channels --features channel-acp-server --lib turn_tool_events

All three pass locally.

Security & Privacy Impact

New permissions, capabilities, or filesystem access scope? No. The change is the missing-back-channel fix; supervised tool gating works the same way every other channel does.
New external network calls? No.
Secrets / tokens / credentials handling changed? arguments_summary is generated upstream by summarize_args, which already redacts #[secret] and #[derived_from_secret] fields. Raw args are not passed across the WS boundary.
PII or real identities in diff, tests, fixtures, or docs? No.

Compatibility

Backward compatible? Yes. Clients that do not understand approval_request frames simply do not respond, in which case the channel times out and auto-denies after 120s, matching the existing channel default.
Config / env / CLI surface changed? No. The 120s timeout is hardcoded for now and matches TelegramConfig::approval_timeout_secs's default.

Rollback

git revert <merge-sha>. The runtime variant and the WS channel revert in lockstep; no on-disk state to clean up.

…se WS protocol Step 1 of the gateway-approval-bypass fix. Document the WS frame contract for tool approvals so a maintainer reviewer can sign off on the wire shape before the deeper plumbing lands. This commit only adds documentation: - Module-level rustdoc gains a "Tool approvals (in progress, see zeroclaw-labs#6207)" section spelling out the JSON shape both directions, the timeout + auto-deny semantics, and the parity goal with process_channel_message + ApprovalManager. - A TODO comment in the TurnEvent forwarder names the exact code locations that still need wiring: TurnEvent::ApprovalRequest in zeroclaw-runtime, the approval gate in crates/zeroclaw-runtime/src/agent/loop_.rs:1572-1619, and the ApprovalManager construction site that does not yet exist on the gateway path. Not in this commit (deliberately separate to keep the protocol contract reviewable in isolation): - TurnEvent::ApprovalRequest variant on the runtime side. - Threading Option<&ApprovalManager> through Agent::turn_streamed (or switching the gateway to the same channel-message path the actual channels use, which gives approvals for free). - Web UI: rendering the approval card with Approve / Deny / Always buttons in web/src/. - Auto-deny on client disconnect / timeout. Refs zeroclaw-labs#6207.

CI flagged the standalone comment block between match arms; rustfmt prefers it hung off the closing brace of the previous arm. Apply `cargo fmt` to satisfy the lint. Refs zeroclaw-labs#6207.

The agent's tool loop already iterates `channel_handles.ask_user` looking for a Channel that can answer ChannelApprovalRequest. The gateway WS path constructed an Agent but never registered itself as such a channel, so any supervised tool call routed through /ws/chat would auto-deny with no operator surface. Add a TurnEvent::ApprovalRequest variant on the runtime side and a WsApprovalChannel that, when the agent calls request_approval, mints a request_id, emits the event over a connection-level mpsc, and parks on a oneshot waiter. The WS forward task drains the mpsc onto the wire; inbound `approval_response` frames pop the waiter from the shared pending map. Timeouts and disconnects auto-deny so the agent never sees None and never falls back to the implicit no-op path. Wire format: Server -> Client: { "type": "approval_request", "request_id": "<uuid>", "tool": ..., "arguments_summary": ..., "timeout_secs": 120 } Client -> Server: { "type": "approval_response", "request_id": "<uuid>", "decision": "approve" | "deny" | "always" } `arguments_summary` is the redacted summary the runtime synthesises via `zeroclaw_runtime::approval::summarize_args`; raw args (including secret-bearing fields) never reach the wire. The ACP channel keeps its own `request_choice` path; ApprovalRequest events are dropped on the ACP transport since approvals there flow through session/request_permission. Closes zeroclaw-labs#6207

Audacity88

I checked the current PR snapshot at head 58a98d4, linked issue #6207, current labels and status checks, the absence of prior comments/reviews, the five-file PR diff, and the adjacent runtime/channel approval contracts. Line references below include current PR-head source and adjacent source where the behavior depends on existing contracts. The request-id/oneshot shape is a good direction, but I think this needs changes because the new path does not actually round-trip an approval during a WebSocket turn yet, and the new WS contract relies on a redaction guarantee the current helper does not provide.

🟢 What looks good — reusing the channel approval contract

Using ChannelApprovalRequest / ChannelApprovalResponse for the gateway path is the right abstraction. Once the transport pump is wired into the active turn, this should let WS approvals share the same approve/deny/always policy and audit path as other channel-backed approvals instead of inventing a separate gateway-only policy.

🔴 Blocking — approval requests are queued in a loop that is not running during the turn

handle_socket awaits process_chat_message(...) when it receives a message frame (crates/zeroclaw-gateway/src/ws.rs:536; the first-message fallback has the same shape around :393). While that await is in progress, the outer socket select! is suspended. That matters because the new approval pump also lives in that outer loop: outbound approval_request frames are drained from approval_event_rx at ws.rs:552-575, and inbound approval_response frames are read at ws.rs:466-490.

So a supervised tool call can insert the pending oneshot and send the TurnEvent::ApprovalRequest in crates/zeroclaw-gateway/src/ws_approval.rs:88-108, but the client never sees that request because the only task that forwards it is waiting for the turn to finish. The same blocked loop also cannot read the user's approval response. The practical result is timeout/deny, and then the outer loop can resume and send a stale approval prompt after the pending entry was already removed.

That means this does not satisfy #6207's core requirement yet: the web client still cannot approve the supervised tool call before timeout. I would move approval-event forwarding and approval_response handling into the same concurrent turn driver, or route the approval request through the per-turn event stream that process_chat_message already drains while also keeping response reads active during the turn. A regression test should prove that /ws/chat sends an approval_request, accepts an approve response, and executes the tool before the timeout.

🔴 Blocking — `arguments_summary` is documented as redacted but the current helper only stringifies values

The new API/WS comments say arguments_summary is secret-redacted (crates/zeroclaw-api/src/agent.rs:36-39, crates/zeroclaw-gateway/src/ws.rs:42-51), and the PR body says raw args with secret-bearing fields never reach the wire. But zeroclaw_runtime::approval::summarize_args currently iterates every key/value and truncates the stringified value (crates/zeroclaw-runtime/src/approval/mod.rs:224-247). WsApprovalChannel copies that string into the new event (crates/zeroclaw-gateway/src/ws_approval.rs:92-96), and the gateway forwards it directly to the browser (crates/zeroclaw-gateway/src/ws.rs:560-566, :705-718).

That means an approval request with an argument such as api_key, token, or another secret-shaped field can still put the secret value into the approval_request frame, just truncated. The helper gap may be pre-existing for other approval transports, but this PR makes that string a documented WebSocket protocol field and uses the claimed redaction as the security/privacy mitigation for the new gateway boundary. I would either route the summary through a real redaction/scrubbing helper before creating or forwarding the approval event, with coverage for secret-shaped fields, or remove the redaction claim until that guarantee actually exists. Given the gateway/security boundary and the PR's explicit privacy claim, I would block this until the wire frame cannot expose secret-bearing tool arguments.

WareWolf-MoonWall

Review — PR #6387 · feat(gateway/ws): tool-approval back-channel via WsApprovalChannel

Verdict: --request-changes
Head reviewed: 58a98d49071ab03f891d3cb7c9dc9943c7ce64d3

Take-stock before writing

Active block holders:

Audacity88: CHANGES_REQUESTED — two 🔴 blockers (turn-loop concurrency; redaction claim vs. implementation)
JordanTheJet: Requested, no review posted
WareWolf-MoonWall: Requested (this review)
theonlyhennygod: Requested, no review posted

Risk classification: This PR touches crates/zeroclaw-gateway/src/** and crates/zeroclaw-api/src/agent.rs, both of which are explicitly listed as High risk in AGENTS.md. Per FND-006 §4.5, code at a security or trust boundary requires explicit, demonstrable correctness — not just a passing CI run.

I independently verified both of Audacity88's blockers against the live source. Both are real. I have additional analysis below, including one finding about a duplicated serialization path that sharpens the picture on the concurrency issue.

🔴 [blocking] — Approval responses can never arrive during a turn: the receive path is not concurrent with `process_chat_message`

Audacity88 identified this correctly. I want to be precise about the mechanism because the fix path matters.

How the current code works (before and after this PR):

handle_socket runs an outer tokio::select! loop with two arms: client_msg = receiver.next() and event = broadcast_rx.recv(). When a message frame arrives, the client_msg arm takes control and calls:

process_chat_message(&state, &mut agent, &mut sender, &content, &session_key).await;

While this .await runs, the outer select! is fully suspended. receiver.next() is not being polled. No new WebSocket frames can be read from the client.

What the PR adds:

An approval_event = approval_event_rx.recv() arm in the outer select! — unreachable during a turn for the same reason.
An approval_response handler inside the client_msg arm — also unreachable during a turn, because it's gated on receiver.next() firing, which the suspended select! cannot do.
Inside process_chat_message's internal forward_fut, a TurnEvent::ApprovalRequest match arm that serializes the frame and sends it to the client.

Item (3) is actually correct: process_chat_message uses tokio::join!(turn_fut, forward_fut) internally, so TurnEvent::ApprovalRequest events emitted by the agent can reach the client during a turn — the outbound path works.

The problem is exclusively the inbound path: the client's approval_response frame arrives at the WebSocket as a new message, and there is nothing polling receiver while process_chat_message is running. The WsApprovalChannel's oneshot receiver parks indefinitely until the timeout, then the agent receives Deny and the turn completes. Only then does the outer select! resume and drain the now-stale approval_response frame — but the pending entry was already removed on timeout.

Concretely: the operator sees a prompt, clicks Approve, and the tool gets denied anyway. This doesn't satisfy #6207.

What a resolution looks like:

The process_chat_message signature already holds &mut sender (for streaming output) and &mut agent. A clean fix would thread a mutable reference to the receiver through process_chat_message as well, and drive it concurrently inside the function — alongside turn_fut and forward_fut — routing any approval_response frames to the shared pending_approvals map while the turn runs. The internal tokio::join! (or a tokio::select! inside forward_fut) could handle the three concurrent streams: turn events → wire, broadcast events → wire, and client messages → pending map.

A regression test should demonstrate the complete round-trip: /ws/chat sends an approval_request, the client sends an approve response, and the tool executes successfully before the 120s timeout.

🔴 [blocking] — The `arguments_summary` redaction claim in the wire-protocol comments and PR body is false

Audacity88 identified this too. I verified it against the source.

The PR introduces arguments_summary as a documented wire-protocol field with an explicit security property. The module doc at crates/zeroclaw-gateway/src/ws.rs states:

"The runtime must not echo any #[secret] or #[derived_from_secret] field (auth tokens, API keys, OAuth secrets) into the summary."

The PR body says:

"arguments_summary is generated upstream by summarize_args, which already redacts #[secret] and #[derived_from_secret] fields."

I read summarize_args in crates/zeroclaw-runtime/src/approval/mod.rs. The function iterates every key/value in the arguments object, formats each as key: truncated_value, and joins them. It does not check for any attribute annotations, does not consult a secret-field registry, and does not skip or redact any key. A tool argument named api_key or oauth_token appears in the summary with its value truncated to 80 characters but otherwise intact.

This means the documented security guarantee does not exist. A browser client receiving an approval_request frame for a tool call that includes a credential-bearing argument will see that credential in the summary. Per FND-006 §4.5:

"An error in an authorization path cannot [be recovered from gracefully]. Know which kind of function you are writing, and let that determination drive how aggressively you surface failures from it."

The PR makes arguments_summary a gateway wire-protocol boundary — exactly the location where this guarantee must be enforced. There are two valid resolutions:

Option A: Implement actual secret-field scrubbing in summarize_args (or in a new scrub_args_for_summary function called before the event is created). This means checking the argument keys against a registry of known secret-shaped field names and replacing their values with [redacted]. A test with api_key, token, password, and oauth_secret arguments must prove redaction. This is the right long-term fix.

Option B: Remove the redaction claim from the wire-protocol docs and the PR body until Option A lands. Change the security impact statement to accurately describe what summarize_args actually does (truncates to 80 chars). Accept that the current behavior has a known gap and track it explicitly in a follow-up issue.

Given the gateway/security boundary classification and the explicit privacy claim in the PR description, I would require Option A before this merges, or a clear acknowledgement of the gap via Option B with a tracking issue. Leaving a false security claim in the wire-protocol documentation is worse than acknowledging the known limitation.

🟢 [praise] — Using `ChannelApprovalRequest`/`ChannelApprovalResponse` is the right abstraction

Reusing the existing channel approval contract for the gateway path is the architecturally correct choice. Once the transport concurrency is fixed, WS approvals will share the same approve/deny/always policy and audit path as Telegram, CLI, and ACP approvals — no gateway-specific policy divergence. The request-id/oneshot shape is clean and testable. This is worth preserving as the foundation of the fix.

🟢 [praise] — ACP `TurnEvent::ApprovalRequest` drop is correctly handled

The notification_for_turn_event change in acp_server.rs returns Option<JsonRpcNotification> and the ApprovalRequest arm returns None. The comment is accurate: ACP sessions use session/request_permission for approval, not the turn-event stream. The PR correctly avoids routing ApprovalRequest events over ACP — and the tests are updated to expect("...") on the Some(_) variants. This is the right shape.

🟡 [warning] — `process_chat_message` has a duplicate `ApprovalRequest` serialization path that should be removed

The PR adds TurnEvent::ApprovalRequest handling in two places:

Inside process_chat_message's forward_fut (ws.rs around line 705) — this path is reachable and will forward the frame to the client during a turn. This is the path that should survive.
In the outer select!'s approval_event_rx arm (ws.rs around line 552) — this arm drains a separate mpsc channel and is unreachable during a turn (blocked by the process_chat_message await). It exists because WsApprovalChannel::request_approval sends to approval_event_tx, which feeds approval_event_rx. But if (1) is already forwarding the frame, this arm is either redundant or represents a design conflict about which event stream carries the approval events.

If the fix for the concurrency blocker threads receiver through process_chat_message (as described above), the approval_event_rx arm in the outer select! may no longer be needed at all — the turn's event channel already carries TurnEvent::ApprovalRequest. Clarifying which path is canonical before adding more code to either will make the fix cleaner.

🔵 [suggestion] — The 120s hardcoded timeout should be a per-connection or per-session config knob

The PR body notes: "The 120s timeout is hardcoded for now and matches TelegramConfig::approval_timeout_secs's default." This is fine for an initial implementation. When this lands, the follow-up should plumb GatewayConfig::ws_approval_timeout_secs (or reuse an existing approval-timeout field) through handle_socket so operators on low-latency setups or batch automation contexts can tune it. Filing a tracking issue for this before merge would prevent it from becoming invisible drift.

🔵 [suggestion] — `register_channel("ws", ...)` unconditionally registers the approval channel

WsApprovalChannel is registered before the main loop regardless of whether the agent has any supervised tools. In sessions where no tool supervision is active, this burns a parking_lot::Mutex and two Arc allocations per connection and adds a no-op channel to the ask_user iteration. Consider registering only when the agent's ApprovalManager is configured — or document that the cost is acceptable for the connection lifecycle. Non-blocking.

Summary

Both of Audacity88's 🔴 blockers are real and independently verified:

The approval response round-trip doesn't work because receiver is not polled during the turn. The outbound frame reaches the client; the inbound response never does.
summarize_args doesn't redact secret-shaped fields, making the documented wire-protocol security guarantee false.

The abstraction choice (channel approval contract, request-id/oneshot, ACP drop) is sound and worth building on. Fix the transport concurrency and either implement real secret-field redaction or honestly document the current behavior, and this will be in good shape.

WareWolf-MoonWall · 2026-05-05T23:36:13Z

@JordanTheJet — milestone alignment needed: this PR does not clearly fit within the scope boundary of any open milestone. Please advise on placement or deferral.

singlerider added bug Something isn't working gateway Auto scope: src/gateway/** changed. labels May 5, 2026

singlerider requested a review from Audacity88 May 5, 2026 04:14

singlerider added runtime Auto scope: src/runtime/** changed. security Auto scope: src/security/** changed. risk: high Auto risk: security/runtime/gateway/tools/workflows. labels May 5, 2026

singlerider requested a review from WareWolf-MoonWall May 5, 2026 04:14

singlerider added the size: XS Auto size: <=80 non-doc changed lines. label May 5, 2026

singlerider self-assigned this May 5, 2026

github-actions Bot removed gateway Auto scope: src/gateway/** changed. runtime Auto scope: src/runtime/** changed. security Auto scope: src/security/** changed. labels May 5, 2026

singlerider added 2 commits May 5, 2026 15:01

fix(gateway/ws): rustfmt the zeroclaw-labs#6207 TODO comment block

fbf90b0

CI flagged the standalone comment block between match arms; rustfmt prefers it hung off the closing brace of the previous arm. Apply `cargo fmt` to satisfy the lint. Refs zeroclaw-labs#6207.

singlerider added this to the v0.7.5 milestone May 5, 2026

singlerider changed the title ~~feat(gateway/ws): scaffold #6207 approval-request/response WS protocol~~ feat(gateway/ws): tool-approval back-channel via WsApprovalChannel May 5, 2026

singlerider marked this pull request as ready for review May 5, 2026 06:29

singlerider requested review from JordanTheJet and theonlyhennygod as code owners May 5, 2026 06:29

Audacity88 added size: M Auto size: 251-500 non-doc changed lines. and removed size: XS Auto size: <=80 non-doc changed lines. labels May 5, 2026

Audacity88 requested changes May 5, 2026

View reviewed changes

WareWolf-MoonWall requested changes May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gateway/ws): tool-approval back-channel via WsApprovalChannel#6387

feat(gateway/ws): tool-approval back-channel via WsApprovalChannel#6387
singlerider wants to merge 3 commits intozeroclaw-labs:masterfrom
singlerider:feat/6207-gateway-approval-manager

singlerider commented May 5, 2026 •

edited

Loading

Uh oh!

Audacity88 left a comment

Uh oh!

WareWolf-MoonWall left a comment

Uh oh!

WareWolf-MoonWall commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

singlerider commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation Evidence

Security & Privacy Impact

Compatibility

Rollback

Uh oh!

Audacity88 left a comment

Choose a reason for hiding this comment

🟢 What looks good — reusing the channel approval contract

🔴 Blocking — approval requests are queued in a loop that is not running during the turn

🔴 Blocking — arguments_summary is documented as redacted but the current helper only stringifies values

Uh oh!

WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

Review — PR #6387 · feat(gateway/ws): tool-approval back-channel via WsApprovalChannel

Take-stock before writing

🔴 [blocking] — Approval responses can never arrive during a turn: the receive path is not concurrent with process_chat_message

🔴 [blocking] — The arguments_summary redaction claim in the wire-protocol comments and PR body is false

🟢 [praise] — Using ChannelApprovalRequest/ChannelApprovalResponse is the right abstraction

🟢 [praise] — ACP TurnEvent::ApprovalRequest drop is correctly handled

🟡 [warning] — process_chat_message has a duplicate ApprovalRequest serialization path that should be removed

🔵 [suggestion] — The 120s hardcoded timeout should be a per-connection or per-session config knob

🔵 [suggestion] — register_channel("ws", ...) unconditionally registers the approval channel

Summary

Uh oh!

WareWolf-MoonWall commented May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

singlerider commented May 5, 2026 •

edited

Loading

🔴 Blocking — `arguments_summary` is documented as redacted but the current helper only stringifies values

🔴 [blocking] — Approval responses can never arrive during a turn: the receive path is not concurrent with `process_chat_message`

🔴 [blocking] — The `arguments_summary` redaction claim in the wire-protocol comments and PR body is false

🟢 [praise] — Using `ChannelApprovalRequest`/`ChannelApprovalResponse` is the right abstraction

🟢 [praise] — ACP `TurnEvent::ApprovalRequest` drop is correctly handled

🟡 [warning] — `process_chat_message` has a duplicate `ApprovalRequest` serialization path that should be removed

🔵 [suggestion] — `register_channel("ws", ...)` unconditionally registers the approval channel