Skip to content

fix(agent): treat tool_use/tool_result as atomic groups in history pruning#4825

Merged
SimianAstronaut7 merged 2 commits intozeroclaw-labs:masterfrom
singlerider:fix/pruner-tool-group-atomic
Mar 28, 2026
Merged

fix(agent): treat tool_use/tool_result as atomic groups in history pruning#4825
SimianAstronaut7 merged 2 commits intozeroclaw-labs:masterfrom
singlerider:fix/pruner-tool-group-atomic

Conversation

@singlerider
Copy link
Copy Markdown
Collaborator

@singlerider singlerider commented Mar 27, 2026

Summary

  • Base branch target: master
  • Problem: History pruner severs tool_use/tool_result message pairs when trimming conversation history, causing Anthropic 400 errors during tool-heavy conversations
  • Why it matters: At max_context_tokens = 32000 with 15+ tool iterations, the pruner fires and can drop a tool result message while leaving its assistant(tool_use) parent — Anthropic rejects the payload as malformed
  • What changed: Phase 1 (collapse) handles multi-tool groups atomically. Phase 2 (drop) removes assistant + N*tool as a unit. emergency_history_trim gets the same awareness.
  • What did not change: Token estimation, keep_recent protection, system message preservation. Workaround (max_context_tokens = 200000) still valid.

Label Snapshot (required)

  • Risk label: risk: medium
  • Size label: size: S
  • Scope labels: agent
  • Module labels: agent: history_pruner
  • Contributor tier label: N/A

Change Metadata

  • Change type: bug
  • Primary scope: agent

Linked Issue

Validation Evidence (required)

cargo fmt --all -- --check     # clean
cargo clippy --all-targets -- -D warnings  # clean
cargo test --lib               # 5799 pass, 0 fail
  • Evidence provided: 11 pruner tests (5 new) including realistic 15-iteration token pressure test
  • If any command is intentionally skipped: N/A

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No

Privacy and Data Hygiene (required)

  • Data-hygiene status: pass
  • Redaction/anonymization notes: N/A
  • Neutral wording confirmation: Yes

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

i18n Follow-Through (required when docs or user-facing wording changes)

  • i18n follow-through triggered? No

Human Verification (required)

  • Verified scenarios: Confirmed workaround (max_context_tokens = 200000) eliminates the error on deployed bot
  • Edge cases checked: 15-iteration tool loop under token pressure; multi-tool-call responses (assistant with 2+ tool_use blocks)
  • What was not verified: Active reproduction of the pruner-severed pair (requires low max_context_tokens and many tool iterations)

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: history_pruner.rs (prune_history), history.rs (emergency_history_trim)
  • Potential unintended effects: Collapse summary format changed from [Tool result: ...] to [Tool exchange: N tool call(s) — results collapsed]
  • Guardrails/monitoring: Existing pruner log lines. Invariant enforced by tests: no tool message without preceding assistant.

Agent Collaboration Notes (recommended)

  • Agent tools used: Claude Code (Opus 4.6)
  • Workflow/plan summary: Debug logs from production identified pruner firing at iteration 15; git blame traced to interaction between commits 4fa47646 and 74a29ec0
  • Verification focus: No orphaned tool messages after pruning under any budget pressure
  • Confirmation: naming + architecture boundaries followed

Rollback Plan (required)

  • Fast rollback command/path: Revert commits, redeploy
  • Feature flags or config toggles: Workaround: agent.max_context_tokens = 200000 prevents pruner from firing
  • Observable failure symptoms: Anthropic 400 error "tool_use ids were found without tool_result blocks" during tool-heavy conversations

Risks and Mitigations

  • Risk: Atomic group dropping removes more messages per drop operation (assistant + N tools vs 1 message)
    • Mitigation: This is correct behavior — dropping partial groups was the bug. Budget enforcement still converges.

🤖 Generated with Claude Code

@github-actions github-actions Bot added the agent Auto scope: src/agent/** changed. label Mar 27, 2026
…uning

The history pruner and emergency trim could sever tool_use/tool_result
pairs when trimming conversation history to fit the token budget. This
caused Anthropic 400 errors ("tool_use ids were found without
tool_result blocks") during tool-heavy conversations that exceeded
max_context_tokens.

Phase 1 (collapse) now handles multi-tool groups: an assistant message
followed by N consecutive tool messages is collapsed into a single
summary, not one-at-a-time. Phase 2 (drop) drops assistant+tool
groups atomically instead of message-by-message.

emergency_history_trim gets the same atomic-group awareness.

Workaround: increasing agent.max_context_tokens to match the model's
actual context window (e.g., 200000 for Claude Sonnet) prevents the
pruner from firing in most cases.

Closes zeroclaw-labs#4810
@singlerider singlerider force-pushed the fix/pruner-tool-group-atomic branch from 030aa37 to 3d252f3 Compare March 27, 2026 23:18
@SimianAstronaut7 SimianAstronaut7 merged commit 87698ad into zeroclaw-labs:master Mar 28, 2026
20 checks passed
@singlerider singlerider mentioned this pull request Mar 28, 2026
20 tasks
5queezer added a commit to 5queezer/hrafn that referenced this pull request Apr 2, 2026
…uning

Prevents orphaned tool_result messages from causing Anthropic 400
errors during tool-heavy conversations with low context budgets.

Ported from zeroclaw-labs/zeroclaw#4825 by @singlerider.
Original PR was closed without review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5queezer added a commit to 5queezer/hrafn that referenced this pull request Apr 2, 2026
…uning (#74)

Prevents orphaned tool_result messages from causing Anthropic 400
errors during tool-heavy conversations with low context budgets.

Ported from zeroclaw-labs/zeroclaw#4825 by @singlerider.
Original PR was closed without review.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kairotan added a commit to Kairotan/zeroclaw that referenced this pull request Apr 3, 2026
* fix(matrix): preserve thread context on first follow-up message (#4805)

* fix(matrix): preserve thread context on first follow-up message

Always root a thread at the incoming message event ID instead of
leaving thread_ts as None for non-threaded messages. This prevents
a session key mismatch where the first exchange is stored under a
room-level key but follow-up messages use a thread-scoped key,
causing the bot to lose context from the initial question and
response.

The bot now explicitly threads its response back to the user's
original message rather than relying on Matrix implicit threading.
Thread root is the user's question, not the bot's answer.

Documents threading behavior in the E2EE guide. In encrypted rooms,
the SDK decrypts events transparently before thread context is
evaluated, so threading works identically.

Closes #4804

* fix(lint): apply cargo fmt to context_compressor.rs

* fix(gateway): respect path_prefix in get-paircode command (#4632)

The fetch_paircode function was constructing URLs without considering
the gateway.path_prefix configuration option. This caused the CLI
command "zeroclaw gateway get-paircode" to fail when path_prefix
was configured (e.g., for reverse proxy deployments).

Changes:
- Add path_prefix parameter to fetch_paircode function
- Include path_prefix in admin API URLs (/admin/paircode and /admin/paircode/new)
- Extract path_prefix from config in GetPaircode command handler

Fixes #4456

* fix(config): resolve temp directory canonicalization for non-existent paths on macOS (#4529)

Co-authored-by: wangyingtao.10 <wangyingtao.10@jd.com>

* fix(gateway): improve web dashboard unavailable message for Homebrew users (#4438)

* fix(gateway): improve web dashboard unavailable message for Homebrew users (#3655)

The error shown when the web dashboard is not bundled now includes
context-specific guidance for Homebrew users (brew reinstall zeroclaw),
manual build-from-source instructions, and a Docker alternative.

Also harden the pub-homebrew-core workflow's Node.js dependency injection
so it falls back gracefully when the 'rust' depends_on line is absent,
ensuring node is always declared as a build dependency in the formula.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(wati): add missing attachments field to ChannelMessage initializer

Pre-existing build error from feat(channels): add automatic media understanding pipeline (#4402).

---------

Co-authored-by: SpaceLobster <spacelobster@SpaceLobsters-Mac-mini.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): add missing bind paths for bubblewrap sandbox (#4341)

Add /usr/local, /bin, and /sbin as read-only bind mounts in the
bubblewrap sandbox configuration. This fixes a regression where
Python and other system tools installed in these directories
were not accessible within the sandbox.

Fixes #4338

* fix(security): make command allowlist matching case-insensitive on Unix (#4552)

The executable basename was lowercased by the caller, but the allowlist
entry was compared in its original case. This caused mixed-case entries
like "icalBuddy" to fail matching on Unix, while working on Windows
(which had its own lowercase fallback).

Lowercase the allowlist entry before comparison so "icalBuddy" in config
matches the "icalbuddy" executable.

Closes #4446

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(cron): add Feishu/Lark channel support for cron job delivery (#4378)

Enable cron jobs to deliver messages to Feishu/Lark channels using
the same pattern as existing channel implementations.

Changes:
- Add "lark" and "feishu" to validated delivery channels in cron/mod.rs
- Add delivery logic in cron/scheduler.rs with channel-lark feature gating
- Update agent loop default injection to support lark/feishu
- Update tool schemas and help text for lark/feishu

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): prevent duplicate tool call notifications in agent chat (#4448)

Remove the separate channel message notifications for tool calls since
they are already displayed via the draft updater progress messages.
This fixes the issue where agent chat was sending multiple messages
instead of a single consolidated response.

The tool call progress (🔧 tool name, ⏳/✅ status) is already sent through
the on_delta channel and displayed in the draft message. Sending additional
separate channel messages was causing spammy/duplicate output.

Fixes #3513

* feat(config): add provider_env for injecting API keys from config (#4322)

* feat(config): add provider_env for injecting API keys from config

New [provider_env] section allows storing provider API keys directly
in config.toml instead of relying on shell environment. Keys are
injected as process env vars at startup (only if not already set).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(onboard): add missing provider_env field to wizard Config constructors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(agent): consolidate multiple messages into single response (#4436)

* fix(agent): consolidate multiple messages into single response (#3513)

Suppress intermediate tool_call and tool_result WebSocket events from
being rendered as separate chat message bubbles in AgentChat. Internal
tool invocations are processing steps — only the final 'done' event
with full_response should appear as an agent message.

Also removes dead i18n keys (agent.tool_call_prefix,
agent.tool_result_prefix) that were only used by the removed handlers.

Fixes #3513

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(wati): add missing attachments field to ChannelMessage initializer

Pre-existing build error from feat(channels): add automatic media understanding pipeline (#4402).
The attachments field was added to ChannelMessage but not all initializers were updated.

---------

Co-authored-by: SpaceLobster <spacelobster@SpaceLobsters-Mac-mini.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): serialize Bedrock env-var tests to prevent parallel race (#4883)

* fix(tests): serialize Bedrock env-var tests to prevent parallel race

Closes #4809

* style: fix pre-existing fmt and clippy warnings

- cargo fmt: wrap long assert! macros in uf2.rs and uno_q_bridge.rs
- clippy: use underscored hex literal 0x0200_0000 in schema.rs
- clippy: gate unix-only TempDir import behind #[cfg(unix)]

---------

Co-authored-by: rareba <rareba@users.noreply.github.com>

* refactor: consolidate Dockerfiles from 4 to 2

- Dockerfile: add `debian` target alongside existing `dev` and `release`
  targets, replacing the separate Dockerfile.debian
- Dockerfile.ci: use VARIANT build-arg (distroless|debian) to replace
  the separate Dockerfile.debian.ci
- Update release workflows to use `build-args: VARIANT=debian`
- Update docker-compose.yml to use `target: debian`
- Remove Dockerfile.debian and Dockerfile.debian.ci

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(debug): add --log-llm flag to dump LLM provider message payloads (#4213)

* feat(debug): add --log-llm flag to dump LLM provider message payloads

Adds a global --log-llm flag that logs the exact messages sent to the
LLM provider on each turn: full system prompt + history on turn one,
growing history on subsequent turns.

Usage:
  zeroclaw agent --log-llm
  zeroclaw agent --log-llm -m "hello"
  zeroclaw daemon --log-llm

Implementation:
- Global `--log-llm` flag on `Cli` (available to all subcommands)
- When set, adds a `zeroclaw::providers::reliable=trace` directive to
  the tracing subscriber filter so only LLM message traces surface,
  without flooding other TRACE-level noise
- `ReliableProvider::chat()` emits one TRACE log per message (role,
  char count, full content) on the first attempt of each call; retries
  and failover attempts are not re-logged
- Gated on `tracing::enabled!(TRACE)` so the iteration over messages
  is a no-op at runtime when the flag is not set

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: apply rustfmt to --log-llm subscriber setup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(debug): pretty-print LLM messages as JSON array under --log-llm

Replaces the per-message trace loop with a single serde_json::to_string_pretty
of the full messages slice, producing a clean JSON array that mirrors the
actual wire payload sent to the provider.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(channels): skip tools summary for native tools (#4499)

* fix(channels): skip tools summary for native tools

* test(channels): adapt native-tools prompt respin to autonomy arg

* fix(providers): disable responses API fallback for custom OpenAI-compatible providers (#4333)

Custom OpenAI-compatible providers (configured via "custom:<url>") were
incorrectly falling back to the /v1/responses API when chat completions
returned 404. This caused errors because most custom providers only support
the standard /v1/chat/completions endpoint.

Changes:
- Add new constructor new_with_vision_no_responses_fallback() to
  OpenAiCompatibleProvider
- Update custom provider factory to use the new constructor

Fixes #4296

* feat(ollama): allow configurable context size via ZEROCLAW_OLLAMA_NUM_CTX (#3518)

* Ignore JetBrains .idea folder

* fix(ollama): support stringified JSON tool call arguments

* providers: allow ZEROCLAW_PROVIDER_URL env var to override Ollama base URL

Supports container deployments where Ollama runs on a Docker network host
(e.g. http://ollama:11434) without requiring config.toml changes.

Includes regression test ensuring the environment override works.

* fix(clippy): replace Default::default() with ProviderRuntimeOptions::default()

* feat(ollama): allow configurable context size via ZEROCLAW_OLLAMA_NUM_CTX

---------

Co-authored-by: Argenis <theonlyhennygod@gmail.com>

* fix(ollama): preserve :cloud model tag for private remote Ollama servers (#4173)

The Ollama provider previously stripped the :cloud suffix from model names
for any non-localhost endpoint, assuming all remote endpoints were Ollama
Cloud (ollama.com). This caused a 404 when a private Ollama server (e.g.
a LAN server at 192.168.x.x) was configured as api_url: the local server
stores cloud-proxy models under their full name (e.g. "glm-5:cloud"), so
stripping the tag results in "model not found" instead of the correct
"quota exhausted" (429) error.

Fix: introduce is_ollama_cloud_endpoint() which matches only ollama.com.
The :cloud suffix is now stripped (and auth required) only when targeting
Ollama Cloud. Private remote servers receive the model name as-is, letting
them serve the cloud-proxy model correctly and return the proper 429 when
cloud quota is exhausted — which triggers fallback to a local model.

Two new tests cover the private-server behavior.

Co-authored-by: ZeroClaw <zeroclaw@zeroclaw.bot>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(provider): fallback to non-streaming when OpenAI Codex streaming fails (#4538)

Cherry-picked from PR #4411:
- Add Clone derive to ResponsesRequest and related structs for retry
- Retry with stream=false when streaming decode fails
- Add missing field initializers for ChannelMessage (already present in master)

Co-authored-by: OpenClaw Assistant <assistant@openclaw.ai>

* fix(channels): suppress unused Result warning in matrix channel (#4358)

Explicitly discard the Result from `client.encryption().backups().disable().await`
with `let _ = ...` to silence the unused Result compiler warning when building
with `--features channel-matrix`.

Closes #4339

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix: prevent panic on UTF-8 multi-byte char boundary in slug truncation (#4148)

* fix: prevent panic on UTF-8 multi-byte char boundary in slug truncation

Use char_indices to find a valid UTF-8 boundary when truncating slugs
longer than 64 bytes, preventing panics with CJK input.

Closes #4139

Signed-off-by: majiayu000 <1835304752@qq.com>

* chore: bump version to 0.5.6

Signed-off-by: majiayu000 <1835304752@qq.com>

* chore: sync Cargo.lock with version 0.5.6

Signed-off-by: majiayu000 <1835304752@qq.com>

---------

Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* feat(providers): enable vision support for kimi-code provider (#4108)

Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(channels): respect max_history_messages config in channel mode (#4835)

Channel mode was using a hardcoded constant (50) for history message
limit instead of respecting the user-configured max_history_messages
value from the agent configuration.

Changes:
- Remove hardcoded MAX_CHANNEL_HISTORY constant from src/channels/mod.rs
- Use ctx.prompt_config.agent.max_history_messages in append_sender_turn()
- Maintains backward compatibility via default value of 50

Fixes #4740

Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(channels): prevent draft streaming hang after tool loop completion (#4359)

Drop the original `delta_tx` sender before awaiting the draft-updater
task.  Without this, the mpsc channel never closes because the original
sender stays alive on the stack even after `run_tool_call_loop` drops
its clone, causing `draft_updater.await` to block indefinitely.

Added tracing::debug! at the start and end of the post-loop draft
shutdown path for observability.

Fixes #4300

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(gateway): clarify public bind error message (#4763)

Change "exposed to the internet" to "exposed on all network interfaces"
which is more accurate — 0.0.0.0 binds to all interfaces, not
necessarily the internet (e.g. VM/container environments).  Also
remove "(NOT recommended)" qualifier since containers/VMs are a
valid use case, and broaden Docker guidance to cover VMs too.

Closes #4762

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(tests): wrap LocalWhisperConfig.bearer_token test values in Some(...) (#4734)

Field type changed to Option<String> (#serde default) but test fixtures
were not updated, causing E0308 compile errors on master.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(onboard): restore --interactive flag for TTY override (#4410)

Restores the --interactive flag to the onboard command to allow users
to force interactive wizard mode regardless of TTY auto-detection.

- Add --interactive argument to onboard command
- Modify auto-detection logic to respect --interactive flag
- Update test to verify flag acceptance instead of rejection

Fixes #3658

Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(tools): make shell timeout configurable via config.toml (#4334)

* fix(tools): make shell timeout configurable via config.toml

Add `shell_timeout_secs` field to `[autonomy]` config section, wired
through `SecurityPolicy` to `ShellTool`.  The hardcoded 60-second
constant is kept as the default fallback so behaviour is unchanged for
existing configurations.

Closes #4331

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add shell_timeout_secs to all AutonomyConfig construction sites

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(agent): add web_search_tool to default_param_for_tool mapping (#4544)

web_search_tool expects "query" as its parameter but was missing from
the default_param_for_tool match, falling through to the catch-all
"input". This caused GLM-style shortened tool calls to produce
{"input": "..."} instead of {"query": "..."}, silently failing.

Also moved web_search from the URL group to the query group (search
tools use "query", not "url") and added web_fetch to the URL group.

Closes #4542

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(agent): cap oversized tool results to prevent context overflow (#4319)

Add MAX_TOOL_RESULT_CHARS (100k) guard that truncates individual tool
outputs before they enter conversation history. Prevents a single large
tool result (e.g. binary file read) from permanently blowing up the
context window.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(channels): ensure newline between narration and draft status lines (#4360)

When native tool-call providers return assistant narration text that
doesn't end with a newline, the subsequent draft status line (e.g.
"⏳ tool_name") was concatenated directly onto it. This produced
garbled output like "Task started.⏳ count_to" in draft-capable
channels such as Telegram.

Ensure the narration delta always ends with '\n' before it is sent
to the draft updater.

Fixes #4348

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* test(config): add regression test for WhatsApp config without [cli] field (#3456) (#4451)

The ChannelsConfig::cli field was required at deserialization time (missing
#[serde(default)]), causing any config with [channels_config.whatsapp] but no
explicit cli field to fail with "missing field `cli`". The fix landed in
7a9e8159 (#3720); this commit adds a regression test that reproduces the exact
TOML snippet from issue #3456 to prevent recurrence.

Closes #3456

Co-authored-by: SpaceLobster <spacelobster@SpaceLobsters-Mac-mini.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(tools): resolve agent-browser npm shim on Windows (#4557)

On Windows, npm global installs create .cmd shim scripts that Rust's
Command::new("agent-browser") cannot resolve. Wrap the command with
cmd.exe /C on Windows to use the shell's PATH resolution which
handles .cmd and .bat extensions automatically.

Closes #4494

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(config): add regression test for WhatsApp config without [cli] field (#4432)

* test(config): add regression test for WhatsApp config without [cli] field (#3456)

The ChannelsConfig::cli field was required at deserialization time (missing
#[serde(default)]), causing any config with [channels_config.whatsapp] but no
explicit cli field to fail with "missing field `cli`". The fix landed in
7a9e8159 (#3720); this commit adds a regression test that reproduces the exact
TOML snippet from issue #3456 to prevent recurrence.

Closes #3456

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(wati): add missing attachments field to ChannelMessage initializer

Pre-existing build error from feat(channels): add automatic media understanding pipeline (#4402).
The attachments field was added to ChannelMessage but not all initializers were updated.

---------

Co-authored-by: SpaceLobster <spacelobster@SpaceLobsters-Mac-mini.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(gateway): respect path_prefix in get-paircode and shutdown commands (#4555)

* fix(gateway): respect path_prefix in get-paircode and shutdown commands

fetch_paircode() and shutdown_gateway() constructed admin endpoint URLs
without the configured gateway.path_prefix, returning 404 when a prefix
like "/zeroclaw" was set for reverse-proxy deployments.

Pass path_prefix from config to both functions and prepend it to the
admin endpoint paths.

Closes #4456

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: fix rustfmt formatting for shutdown_gateway call

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(cron): add WhatsApp Web delivery channel for cron job announcements (#4258)

The deliver_announcement function was missing a match arm for WhatsApp,
causing cron jobs with delivery.channel = "whatsapp" to fail with
"unsupported delivery channel: whatsapp".

Add "whatsapp" | "whatsapp-web" | "whatsapp_web" match arm that creates
a WhatsAppWebChannel from config and sends the announcement. Feature-
gated behind `whatsapp-web`, matching the pattern used by the matrix
channel.

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix: resolve claude-code test flakiness and update security policy (#4181)

* fix: resolve claude-code test flakiness and update security policy

* fix(security): restrict free command to Linux only in allowed commands

* refactor(tests): further improve echo_provider with tempfile and cleanup

* fix(security): resolve syntax error in default_allowed_commands

---------

Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(observability): use local time for runtime_trace timestamps (#4819)

* fix(agent): fix chat not returning data

Fix issue where chat requests do not return data properly.

Fixes #3681

* fix(observability): use local time for runtime_trace timestamps

Fixes issue where runtime_trace timestamps were recorded in UTC instead
of local time, making it difficult to correlate trace logs with system
logs. Changed from chrono::Utc to chrono::Local for all timestamp
generation in runtime_trace events.

Closes #4816

* Delete CHAT_DATA_FIX.md

* style: fix formatting issues

---------

Co-authored-by: darrenzeng2025 <darrenzeng2025@gmail.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(channels): add channel-lark to default features to support long connection integration (#4444)

Add channel-lark feature to default features in Cargo.toml to enable
Lark/Feishu WebSocket long-connection integration methods by default.

This fixes the issue where Lark/Feishu channel support was disabled
in default builds, preventing users from using the WebSocket long-
connection integration method.

Changes:
- Add channel-lark to default features
- Fix missing attachments field in ChannelMessage initialization
- Fix clippy warnings in lark.rs
- Fix tests that were not running before (feature-gated)

Fixes #3538

Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(agent): separate draft tool narration lines (#4349)

Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(security): respect wildcard opt-out for subshell/redirect guards (#4803)

When allowed_commands = ["*"] and block_high_risk_commands = false,
the operator has explicitly opted out of all command restrictions.
Previously, subshell expansions ($(...), backticks), redirections,
and other shell operators were still blocked regardless, causing
commands like `aws ... --start-time $(date ...)` to fail silently.

Now skips all shell operator guards when full wildcard opt-out is
detected, consistent with the existing risk-level bypass at line 861.

Also adds WARN-level logging when a command is rejected by the
security policy, so rejections are visible in container logs.

Cron improvements:
- Skip delivery for empty/whitespace/"NONE" agent output (prevents
  posting "no incidents" messages when cron job has nothing to report)
- Pass use_markdown_blocks config to Slack channel in cron delivery
- Remove "agent job executed" fallback text for empty agent responses

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(homebrew): publish a release source archive with built dashboard assets (#3934)

* fix(homebrew): publish release source archive with dashboard

* fix(config): detect temp subpaths before canonicalization

---------

Co-authored-by: Apple <Apple@MacBook-Air.local>
Co-authored-by: Apple <Apple@mba.lan>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(provider): fallback to non-streaming request when streaming fails (#3811)

When OpenAI Codex streaming response fails to decode, automatically
retry with a non-streaming request. This addresses the intermittent
'error decoding response body' errors with the Codex API.

Added Clone derive to required structs for request cloning.

Fixes #3786

Co-authored-by: OpenClaw Assistant <assistant@openclaw.ai>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(provider): add Alibaba Coding Plan support (#3889)

- Add Alibaba Coding Plan provider with OpenAI-compatible endpoint
- Add Alibaba Coding Plan Anthropic-compatible endpoint support
- Support provider aliases: coding-plan, alibaba-coding, qwen-coding
- Support credential resolution from CODING_PLAN_API_KEY and DASHSCOPE_API_KEY

Endpoints:
| Protocol | Provider Name | Endpoint |
|----------|---------------|----------|
| OpenAI-compatible | coding-plan, alibaba-coding, qwen-coding | https://coding-intl.dashscope.aliyuncs.com/v1 |
| Anthropic-compatible | coding-plan-anthropic, alibaba-coding-anthropic | https://coding-intl.dashscope.aliyuncs.com/apps/anthropic |

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(gateway): replay session history on WebSocket reconnect (#4696)

* fix(gateway): replay session history on WebSocket reconnect

When a WebSocket client reconnects to a resumed session, the server
now sends persisted messages as `{"type":"history","role":"...","content":"..."}`
frames followed by a `{"type":"history_end"}` sentinel.  The web
client handles these frames to restore the chat display, fixing the
"amnesia" where previous messages were invisible after reconnect.

Server changes:
- ws.rs: after session_start, replay persisted messages as history frames

Client changes:
- AgentChat.tsx: handle history/history_end/session_start message types
- api.ts: extend WsMessage type with new frame types

Closes #4644

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve formatting and clippy warnings after rebase

- Fix unreadable hex literal (0x02000000 -> 0x0200_0000) in schema.rs
- Gate tempfile::TempDir import behind #[cfg(unix)] to match its usage
- Fix formatting in uf2.rs and uno_q_bridge.rs
- Resolve rebase conflicts in AgentChat.tsx and api.ts

---------

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(provider): restrict reasoning_effort to known openai models (#4296) (#4310)

* test(pushover): add unit tests and improve documentation (#3830)

Co-authored-by: Claw <claw@openclaw.ai>

* fix: sessions (#4858)

* fix(service): detect missing loginctl enable-linger and warn users (#4336)

After systemd user service installation, check whether loginctl linger
is enabled for the current user. If not, print a clear warning explaining
that the service will stop when the SSH session ends, along with the
command to fix it. Silently skipped when loginctl is unavailable.

Closes #4284

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(install): add Rust version check to prevent build failures on outdated toolchains (#4380)

The install.sh script previously only checked if Rust was installed, but
did not verify if the version met the minimum requirement specified in
Cargo.toml (rust-version = "1.87").

This caused build failures on systems like Raspberry Pi where users often
have older Rust versions installed via apt (e.g., 1.63).

Changes:
- Add get_minimum_rust_version() to parse rust-version from Cargo.toml
- Add version_ge() for version string comparison
- Add check_rust_version() to verify installed version meets requirements
- Modify install_rust_toolchain() to check version and auto-update via
  rustup if available, or provide clear error message with upgrade
  instructions

Fixes #3677

* feat(channel): add interrupt_on_new_message support for WhatsApp (#4371)

* feat(channel): add interrupt_on_new_message support for WhatsApp

Signed-off-by: Tomás Migone <tomasmigone@gmail.com>

* docs: update whatsapp docs to add interrupt_on_new_message

Signed-off-by: Tomás Migone <tomasmigone@gmail.com>

* ci: format code

Signed-off-by: Tomás Migone <tomasmigone@gmail.com>

* ci: make clippy happy

Signed-off-by: Tomás Migone <tomasmigone@gmail.com>

---------

Signed-off-by: Tomás Migone <tomasmigone@gmail.com>

* feat(service): detect missing loginctl linger and prompt user (#4285)

* feat(service): detect missing loginctl linger and prompt user

When installing or starting a systemd user service, check if linger is
enabled. If not, warn the user and offer to enable it interactively
(requires sudo). Also adds a linger diagnostic to `zeroclaw doctor`.

Fixes the silent daemon death on SSH disconnect for headless VMs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(service): guard linger prompt behind TTY check

Prevent auto-triggering sudo in non-interactive contexts (CI, scripts)
by checking stdin.is_terminal() before prompting. Also extract shared
current_username() helper and use println for consistency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Raul Rodriguez <dev@Rauls-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(security): add 1Password secret resolution via op:// references (#4796)

* feat(security): add 1Password secret resolution via op:// references

Config values starting with op:// are now resolved at startup by
invoking the 1Password CLI (`op read`). This integrates into the
existing SecretStore::decrypt() pipeline so all secret fields
automatically gain support with no changes to config loading code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): skip encryption for 1Password op:// secret references

Preserve op:// URIs as-is during config encryption so they can be
resolved at runtime by 1Password CLI rather than being double-wrapped.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(qq): add audio attachment transcription (#4315)

* fix(gemini): propagate token usage from Gemini provider to cost tracker (#4573)

The Gemini provider parsed usageMetadata from API responses but discarded
it in chat_with_history() (which returns String). Since GeminiProvider
did not override the trait's default chat() method, ChatResponse.usage
was always None, making cost tracking silent no-ops for all Gemini users.

Fix:
- Extract message conversion into chat_with_history_full() that
  returns (String, Option<TokenUsage>)
- Override Provider::chat() to call the new helper and propagate usage
  into ChatResponse
- Preserve tool-instruction injection logic from the trait default

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(agent): fallback to non-streaming when stream errors (#4670) (#4675)

Track stream errors instead of silently breaking. When streaming
fails after emitting partial text, fall back to non-streaming chat
to avoid returning error text as the final assistant response.

If both streaming and fallback fail but streaming produced partial
text, preserve the partial answer rather than failing hard.

Includes regression test reproducing #4670 scenario.

* fix(tool): unify path resolution for file tools (closes #3774) (#3837)

Use SecurityPolicy::resolve_tool_path() in file_write/file_edit/pdf_read/image_info so absolute
paths under workspace are not double-prefixed.

Add a regression test for absolute paths under workspace.

Made-with: Cursor

* feat(matrix): add mention_only config for group room filtering (#4680)

* feat(matrix): add mention_only config for group room filtering

When mention_only = true, the bot only responds to messages that
@-mention its user ID in group rooms. DMs (rooms with ≤2 members)
bypass the gate. The mention is stripped from the message body before
processing. Consistent with Discord, Telegram, Slack, and other
channels that already support this option.

Closes #4666

* fix: add mention_only field to MatrixConfig test constructions

* fix(lint): apply cargo fmt

* test(matrix): add unit tests for mention_only gate, stripping, and DM bypass

Extract mention detection, mention stripping, and DM room detection
into testable helper methods on MatrixChannel. Add 11 unit tests
covering: mention detected at start and mid-message, no-mention
rejection, partial match rejection, mention stripping (start, middle,
only-mention, no-mention), and DM room detection (1, 2, 3+ members).

Collapse nested if into single condition per clippy collapsible_if.

* fix(providers): skip responses fallback on transport errors (#4501)

* fix(prompt): expose autonomy constraints to the model (#3296)

* fix(prompt): dedupe autonomy constraint injection

* fix(gateway): honor runtime autonomy in webhook simple chat

---------

Co-authored-by: Alix-007 <267018309+Alix-007@users.noreply.github.com>

* feat(tools): wire session tools to composite backend for gateway visibility (#4852)

Problem: sessions_list, sessions_history, and sessions_send tools only
queried the JSONL file store (channel sessions). Gateway sessions stored
in SQLite were invisible, causing "No active sessions found" for agents
that only receive messages through the gateway (e.g. Eyrie project chat).

Fix: add CompositeSessionBackend that merges results from both the JSONL
file store (channel sessions) and SQLite backend (gateway sessions),
deduplicating by session key. Wire it into the session tools when the
SQLite database is available; fall back to file-only when it's not.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tools): improve file_read robustness for large/binary files (#4320)

* fix(tools): improve file_read robustness for large/binary files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(channel/discord): download image and video attachments for agent processing (#4871)

* fix(channel/discord): download image and video attachments for agent processing

Previously process_attachments() only handled text/* MIME types;
image/*, video/*, and all other types were silently skipped, causing
the agent to never see images sent in Discord (fixes #4808).

Image and video attachments are now handled as follows:
- When workspace_dir is set: downloaded to <workspace>/discord_files/
  and forwarded as [IMAGE:path] / [VIDEO:path] markers (preferred —
  avoids CDN URL expiry).
- When workspace_dir is unset or download fails: CDN URL used directly
  as [IMAGE:url] / [VIDEO:url] — compatible with multimodal providers
  that accept remote URLs.

Multiple attachments in a single message are each processed and joined
with \n---\n so the agent receives all of them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lint): apply cargo fmt to discord channel tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lint): apply cargo fmt to uf2 and uno_q_bridge tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(channels): add message chunker with per-platform character limits (#4422)

* feat(channels): add message chunker with per-platform character limits

Adds src/channels/chunker.rs with chunk_message() — word-boundary-aware
text splitting for channels that impose maximum message lengths.

Features:
- Breaks at the last whitespace in the trailing third of each window,
  falling back to a hard break for words longer than the limit
- Trims leading/trailing whitespace per chunk; omits empty chunks
- Platform limit constants: TELEGRAM_LIMIT (4096), DISCORD_LIMIT (2000),
  SLACK_LIMIT (40000), MATTERMOST_LIMIT (16383), IRC_LIMIT (400),
  WHATSAPP_LIMIT (4096), MATRIX_LIMIT (65535)

9 unit tests covering word-boundary breaks, hard breaks, empty input,
exact-limit boundaries, single-char limits, and platform constant sanity.

Closes #7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: add missing attachments field and apply cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: retrigger CI after cancelled runs

* fix: move constant assertions into const block to satisfy clippy

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Emanuele Cannizzaro <emanuele.cannizzaro@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: remove RedNote social badge from all READMEs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: support forwarded messages in Telegram channel (#4118) (#4254)

* feat(config): add configurable global max_audio_bytes to TranscriptionConfig (#4114)

Adds max_audio_bytes config field to control transcription size limits globally,
preventing excessive API costs and memory usage.

- Add max_audio_bytes field to TranscriptionConfig
- Wire into TranscriptionManager validation
- Add integration test for size limit enforcement

Size: XS (40 lines)

Co-authored-by: Test <test@test.com>

* feat(channels): persist per-chat model switch to routes.json (#4648)

* feat(channels): persist per-chat model switch to routes.json

When users switch models via model_switch tool, save the choice to
~/.zeroclaw/workspace/routes.json. Overrides persist across daemon
restarts. Per-request model_switch_slot prevents cross-user leakage.

- Global route overrides loaded at startup via init_route_overrides()
- ChannelRouteSelection gains Serialize/Deserialize for JSON persistence
- set_route_selection writes to both ctx-local and global+disk
- Global persistence gated on GLOBAL_ROUTES_FILE being set (not in tests)
- Agent loop captures model_switch_callback before returning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: wrap bearer_token test values in Some() after type change

Also fix duplicate test function name in reliable.rs and add missing
reply_to_message_id field in escalate.rs test code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: ZeroClaw Bot <zeroclaw_bot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore(deps): bump cpal from 0.15.3 to 0.17.3 (#4346)

Bumps [cpal](https://github.com/RustAudio/cpal) from 0.15.3 to 0.17.3.
- [Release notes](https://github.com/RustAudio/cpal/releases)
- [Changelog](https://github.com/RustAudio/cpal/blob/master/CHANGELOG.md)
- [Commits](https://github.com/RustAudio/cpal/compare/v0.15.3...v0.17.3)

---
updated-dependencies:
- dependency-name: cpal
  dependency-version: 0.17.3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* feat(security): add path-validation fallback sandbox (#4531)

Add a software-only sandbox backend that validates file paths against
configurable deny/allow lists. This provides baseline path-traversal
protection on systems where OS-level sandbox backends (Landlock,
Firejail, Bubblewrap, Docker) are unavailable.

- Add `PathValidationSandbox` implementing the `Sandbox` trait
- Register module in `src/security/mod.rs` (alphabetical order)
- Wire as last fallback before `NoopSandbox` in auto-detection chain
- Default deny list covers .ssh, .gnupg, credentials, /etc/shadow
- Supports configurable allow-path restrictions
- 12 inline unit tests covering path extraction, deny/allow logic,
  and wrap_command integration

Co-authored-by: Emanuele Cannizzaro <emanuele.cannizzaro@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(channels): use reversible percent-encoding for session file keys (#4811)

* fix(channels): use reversible percent-encoding for session file keys

The JSONL session store sanitized session keys by replacing special
characters with underscores, but list_sessions() used the sanitized
filename as the key. This caused a mismatch: sessions were persisted
under sanitized keys but looked up with original keys containing
$, :, !, @, and ||. Matrix channels were most affected since event
IDs, room IDs, and user IDs all contain these characters.

Replace lossy sanitization with reversible percent-encoding (%XX).
list_sessions() now decodes filenames back to original keys. Legacy
files are loaded via fallback and migrated on first append.

* fix(channels): add legacy session key fallback for upgrade path

When upgrading from lossy-sanitized session files, hydrated sessions
are keyed under the sanitized name but runtime lookups use the
original unsanitized key. Add a fallback in the history lookup that
checks the legacy sanitized key, migrates the session to the correct
key in both the HashMap and the JSONL file, and logs the migration.

This ensures existing conversation context survives the upgrade to
percent-encoded session keys without manual intervention.

* fix(lint): apply cargo fmt to context_compressor.rs

* feat(tool): add SecretStore integration to http_request tool (#3637)

Add `auth_secret` parameter to the `http_request` tool, allowing the
LLM to reference named secrets from `[http_request.secrets]` config
instead of passing API keys in plaintext. Secrets are resolved via
SecretStore at execution time with lazy config reload and decryption
support.

Changes:
- `config_path` is `Option<PathBuf>` — legacy `new()` returns a clear
  error if `auth_secret` is used without config support
- Secret names are validated (alphanumeric, underscore, hyphen, 1-64 chars)
- Unrelated changes (channels/reliable/model_routing_config) removed

Co-authored-by: Christian Pojoni <christian.pojoni@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(agent): treat tool_use/tool_result as atomic groups in history pruning (#4825)

* fix(agent): treat tool_use/tool_result as atomic groups in history pruning

The history pruner and emergency trim could sever tool_use/tool_result
pairs when trimming conversation history to fit the token budget. This
caused Anthropic 400 errors ("tool_use ids were found without
tool_result blocks") during tool-heavy conversations that exceeded
max_context_tokens.

Phase 1 (collapse) now handles multi-tool groups: an assistant message
followed by N consecutive tool messages is collapsed into a single
summary, not one-at-a-time. Phase 2 (drop) drops assistant+tool
groups atomically instead of message-by-message.

emergency_history_trim gets the same atomic-group awareness.

Workaround: increasing agent.max_context_tokens to match the model's
actual context window (e.g., 200000 for Claude Sonnet) prevents the
pruner from firing in most cases.

Closes #4810

* fix(lint): apply cargo fmt

* fix(channels): close draft sender after tool loop (#4301)

Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(update): verify download checksum against SHA256SUMS (#4434)

* fix(update): verify download checksum against SHA256SUMS (#4294)

After downloading the release binary, fetch the SHA256SUMS asset from
the same GitHub release and compare the SHA-256 digest of the downloaded
bytes against the expected value. A clear, actionable error is emitted
on mismatch so corrupted or tampered downloads no longer fail silently
with misleading errors further down the pipeline.

- Add sha256sums_url field to UpdateInfo
- check() now calls find_sha256sums_url() to locate the sums asset
- download_binary() accepts optional sums URL and calls verify_checksum()
  before writing the binary to disk
- Warn (but continue) when no SHA256SUMS asset exists in the release
- Add 5 unit tests covering the new helpers

Fixes #4294

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(wati): add missing attachments field to ChannelMessage initializer

Pre-existing build error from feat(channels): add automatic media understanding pipeline (#4402).
The attachments field was added to ChannelMessage but not all initializers were updated.

* style: fix formatting to match CI rustfmt

---------

Co-authored-by: SpaceLobster <spacelobster@SpaceLobsters-Mac-mini.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* feat(voice): add unified VoicePipeline facade for STT+TTS channels (#4427)

* feat(voice): add unified VoicePipeline facade for STT+TTS channels

Add `src/voice/` module providing a single `VoicePipeline` struct that
combines the existing `TranscriptionManager` (STT) and `TtsManager` (TTS)
under one API, removing the need for channels to import both sub-systems:

- `VoicePipeline::from_config(&config)` — builds from active Config, both
  halves are optional (guarded by `transcription.enabled` / `tts.enabled`)
- `transcribe(audio, filename)` / `transcribe_with_provider(...)` — STT half
- `synthesize(text)` / `synthesize_with_voice(...)` /
  `synthesize_with_provider(...)` — TTS half
- `is_stt_available()`, `is_tts_available()`, `is_full_duplex()` — capability
  queries for channels to gate audio paths at runtime
- `stt_providers()` / `tts_providers()` — enumerate configured backends
- 10 unit tests; all pass with zero new dependencies

Closes #10

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: add missing attachments field and apply cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: retrigger CI after cancelled runs

---------

Co-authored-by: Emanuele Cannizzaro <emanuele.cannizzaro@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(channels): add room creation and user invite API to Channel trait (#4504)

Add create_room() and invite_user() methods to Channel trait with default
no-op implementations. Implement both for MatrixChannel using the Matrix
SDK's room creation and invite APIs.

Changes:
- Add create_room() to Channel trait (returns empty string by default)
- Add invite_user() to Channel trait (no-op by default)
- Implement create_room() for MatrixChannel using matrix_sdk API
- Implement invite_user() for MatrixChannel using room.invite_user_by_id()
- Add comprehensive test coverage for both trait defaults and Matrix impl

Tests:
- default_create_room_returns_success: validates trait default
- default_invite_user_returns_success: validates trait default
- create_room_lifecycle: validates MatrixTestChannel recording
- invite_user_lifecycle: validates MatrixTestChannel recording
- Updated minimal_channel_all_defaults_succeed with room creation/invite

All tests pass. No clippy warnings. Closes §2.6.

Co-authored-by: Test <test@test.com>

* feat(security): harden native sandbox backends with seccomp and fail-closed fallback (#4821)

* fix(agent): fix chat not returning data

Fix issue where chat requests do not return data properly.

Fixes #3681

* fix(observability): use local time for runtime_trace timestamps

Fixes issue where runtime_trace timestamps were recorded in UTC instead
of local time, making it difficult to correlate trace logs with system
logs. Changed from chrono::Utc to chrono::Local for all timestamp
generation in runtime_trace events.

Closes #4816

* feat(security): harden native sandbox backends with seccomp and fail-closed fallback

This implements security hardening for native sandbox backends as requested
in issue #4812:

1. Add seccomp support to bubblewrap backend with capability dropping
   - Added seccomp_available() helper method
   - Enable --seccomp with CAP_SYS_ADMIN and CAP_SYS_PTRACE dropping when available
   - Fall back gracefully with warning when seccomp not available

2. Add seccomp, caps.drop, and noroot to firejail backend
   - Added helper methods to detect feature availability
   - Enable --seccomp for syscall filtering
   - Enable --caps.drop=all when available
   - Enable --noroot when available
   - Log warnings when features not available

3. Fail-closed behavior for unavailable sandbox backends
   - Added ZEROCLAW_ALLOW_NO_SANDBOX environment variable support
   - When specific backend requested but unavailable:
     * If ZEROCLAW_ALLOW_NO_SANDBOX=1: warn and fallback to NoopSandbox (current behavior)
     * Otherwise: error and require explicit opt-in to allow fallback
   - Added tests for the new allow_noop_fallback() behavior

These changes align native sandbox backends with the Docker backend's
security posture, providing syscall filtering, capability dropping, and
explicit user consent for fallback to application-layer security.

Closes #4812

* Delete CHAT_DATA_FIX.md

* fix(security): use o.stdout instead of o in firejail feature checks

* style: fix formatting issues

* fix(security): update test for fail-closed fallback behavior

* fix(security): use o.stdout in bubblewrap seccomp check

---------

Co-authored-by: darrenzeng2025 <darrenzeng2025@gmail.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(providers): unify OpenAI-compatible native tool response parsing (#4149)

* fix(docker): align workspace with Cargo.lock for --locked builds

The builder used sed to remove crates/robot-kit from [workspace].members because that path was not copied into the image. Cargo.lock is still generated for the full workspace (including zeroclaw-robot-kit), so the manifest and lockfile disagreed. cargo build --release --locked then tried to rewrite Cargo.lock and failed with "cannot update the lock file ... because --locked was passed" (commonly hit when ZEROCLAW_CARGO_FEATURES includes memory-postgres).

Copy crates/robot-kit/ into the image and drop the sed step so the workspace matches the committed lockfile.

Made-with: Cursor

* fix(providers): unify OpenAI-compatible native tool response parsing

Problem: The native tool-calling path that parses /v1/chat/completions responses duplicated a narrower parser than Provider::chat(). It always generated new tool_call IDs and only read nested function.name / function.arguments. Proxies and gateways that return stable tool_call ids or non-standard tool JSON then broke multi-turn tool use or dropped calls.

Change: Reuse parse_native_response for that path and attach TokenUsage as before. Deserialize finish_reason on non-streaming choices for a complete OpenAI-shaped payload (no branching on it yet).

Also add OpenAiCompatibleProvider::new_with_vision_no_responses_fallback for vision-capable endpoints that only support chat completions (no /v1/responses), alongside existing new_no_responses_fallback.

Docs: Document constructor choice and parse_native_response behavior in docs/contributing/custom-providers.md for maintainers wiring custom URLs.
Made-with: Cursor

* fix(response): handle empty model replies and support OpenAI content formats

- Introduced a placeholder for empty model replies to ensure user-visible feedback in channels that reject empty messages.
- Updated the response handling to accommodate OpenAI's `message.content` as either a string or an array of parts, ensuring compatibility with various gateways.
- Enhanced tests to verify the correct deserialization of content formats and the handling of empty responses.

This change improves the robustness of message delivery across different channels and enhances the flexibility of content parsing.

* remove: delete 9router subproject as it is no longer needed

---------

Co-authored-by: lokinh <locnh@uniultra.xyz>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* feat(channel): add observe_group flag and per-chat session keys (#4502)

* feat(channel): add observe_group flag and per-chat session keys

- Add `observe_group` field to `ChannelMessage` — when true, message is
  stored in session history but the agent does not respond. Used by
  mention_only channels to passively track group conversation context.
- Change `conversation_history_key` to use `reply_target` (chat JID)
  instead of `sender`, giving each group and DM an independent session.
- Implement `Default` for `ChannelMessage`.
- Add `observe_group: false` to all channel ChannelMessage constructors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tests): add observe_group field to integration test ChannelMessage structs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add observe_group to lark/matrix channels and fmt fix

Add missing observe_group field to feature-gated channel
ChannelMessage initializations (lark, matrix) that were missed
during merge conflict resolution. Also apply cargo fmt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add observe_group to gmail_push and discord_history channels

These channels were added to master after the original commit, so they
need the new field added during conflict resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: cargo fmt

* fix: add observe_group field to all ChannelMessage initializers

The observe_group field was added to ChannelMessage but not all struct
literals were updated, causing compilation failures. This adds
observe_group: false to every missing ChannelMessage initializer in
tests (channels/mod.rs, channels/traits.rs, channels/slack.rs,
gateway/mod.rs) and removes the erroneously added field from
SendMessage literals in cli.rs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: allow clippy::derivable_impls for ChannelMessage Default impl

* fix: add observe_group field to whatsapp_web ChannelMessage literal

* fix(test): update vision test for per-chat session key format

The observe-group PR changed conversation_history_key() to key by
channel + reply_target (per-chat) instead of channel + reply_target +
sender (per-user). Update the failing vision fallback test to use the
new key format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tests): update session key format to per-chat keys after rebase

After rebasing feat/observe-group-sessions onto master, several tests
still used the old per-user history key format (channel_target_sender)
instead of the new per-chat format (channel_target). Also adds the
missing observe_group field to ChannelMessage initializers introduced
on master during the rebase window.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(providers): add streaming support to RouterProvider (#4533)

* fix(providers): add streaming support to RouterProvider

RouterProvider was missing `supports_streaming`, `stream_chat_with_system`,
and `stream_chat_with_history` implementations, falling back to the trait
default which returns "unknown does not support streaming". This broke the
web UI for users with model routes configured.

Delegate all three streaming methods to the resolved underlying provider,
matching the existing pattern for non-streaming methods.

Closes #4523

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(test): add stream_chat_with_history to MockProvider

The MockProvider in router tests lacked a stream_chat_with_history
override, causing it to use the trait default which returns the
"unknown does not support streaming" error — making the delegation
test indistinguishable from the no-delegation case.

Override with stream::empty() on both MockProvider and Arc<MockProvider>
so the test correctly verifies that RouterProvider delegates rather
than falling through to its own trait default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(cron): add live channel registry for WhatsApp Web cron delivery (#4548)

WhatsApp Web cron delivery fails because deliver_announcement() creates
a new disconnected WhatsAppWebChannel instance instead of reusing the
daemon's connected one. Stateful channels need an active browser session
from listen() to send messages.

Add a process-global live channel registry (OnceLock<Mutex<HashMap>>)
that start_channels() populates after building channels_by_name. The
scheduler's deliver_announcement() checks this registry first, falling
back to constructing new instances only for stateless channels.

Also fix deliver_if_configured() to handle the simplified delivery
format {"channel": "whatsapp", "format": "text"} where mode is empty
(not "announce").

Closes #4537

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* feat(security): add per-channel DM pairing manager (#4530)

Port per-sender, per-channel authorization from RustyClaw. Includes
persistent JSON allowlist, 5-minute TTL pairing codes with unambiguous
charset, and channel-scoped user approval flows.

Co-authored-by: Emanuele Cannizzaro <emanuele.cannizzaro@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(security): add shared SsrfValidator with CIDR blocking and homograph detection (#4421)

* feat(security): add shared SsrfValidator with CIDR blocking and homograph detection

Ports SsrfValidator from RustyClaw into src/security/ssrf.rs as a reusable
security component. Consolidates SSRF protection that was duplicated inline
across web_fetch, http_request, and browser tools.

Key additions over existing inline checks:
- Configurable custom CIDR blocking via add_blocked_range()
- Unicode homograph attack detection (non-ASCII domain rejection)
- DNS rebinding protection (double-resolve consistency check)
- Cloud metadata endpoint blocking (169.254.169.254) even with allow_private_ips
- IPv6 unique-local (fc00::/7) and documentation ranges (2001:db8::/32)

Adds ipnetwork = "0.20" dependency for CIDR-range matching.
Exports SsrfValidator from security module public API.

Closes #1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: apply cargo fmt formatting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: retrigger CI after cancelled runs

* fix: merge upstream master and add missing attachments field in wati.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Emanuele Cannizzaro <emanuele.cannizzaro@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(channels): parse media attachment markers in WhatsApp Web send (#4403)

The WhatsApp Web channel was sending [VOICE:], [IMAGE:], [DOCUMENT:]
media markers as literal text. This adds parse_attachment_markers()
(mirroring the Telegram channel's implementation) to extract markers,
upload local files via wa-rs client.upload(), and send them as native
WhatsApp media messages (image, video, audio/voice, document).

If a media upload fails the marker is silently stripped and a warning
is logged, so users never see raw marker syntax.

Closes #4385

Co-authored-by: rareba <rareba@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Feat/chat history persistence (#3796)

* feat(web): persist chat history with 100 message cap

* refactor(web): centralize chat history capping

---------

Co-authored-by: XingJian <xingjian@kylinos.cn>
Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(tests): unify env-var test synchronization across all provider modules (#4901)

Provider tests in 6 files mutate process-global environment variables
(BEDROCK_API_KEY, AWS_ACCESS_KEY_ID, CLAUDE_CODE_PATH, etc.) during
parallel test execution. Each file had its own independent lock static
and/or EnvGuard struct, so tests across files still raced.

Consolidate into a single shared `providers::test_util` module with:
- One process-wide `env_lock()` backed by a single OnceLock<Mutex<()>>
- One `EnvGuard` struct for RAII save/restore of env vars

All 6 provider test modules now import from test_util instead of
maintaining their own copies. This eliminates cross-file env-var races
that caused intermittent failures in CI (~1/3 failure rate).

Also fix ETXTBSY race in claude_code test helper: use write-to-temp-
then-rename so the executed script path was never open for writing.

Affected files:
- providers/mod.rs (new test_util module, remove local EnvGuard/lock)
- providers/bedrock.rs (remove local EnvGuard, import shared)
- providers/openai_codex.rs (remove local EnvGuard + ENV_MUTEX)
- providers/claude_code.rs (remove local env_lock, fix ETXTBSY)
- providers/kilocli.rs (remove local env_lock)
- providers/gemini_cli.rs (remove local env_lock)

Co-authored-by: lamco-office <office@lamco.io>

* feat(agent): add native_tool_calls_only config to disable text fallback parsing (#3682)

* feat(agent): add `native_tool_calls_only` config to disable text fallback parsing

When enabled, only native structured tool calls from the provider API
are recognized. Text-based fallback parsing (XML tags, markdown blocks,
GLM-style formats, bare URLs) is completely skipped. This prevents
normal URLs in LLM replies from being misinterpreted as shell tool calls
and reduces prompt-injection attack surface for users running models
with native function calling support.

Config: `[agent] native_tool_calls_only = true` (default: false).

* fix: rustfmt formatting and add missing serde defaults for Config fields

- Apply `cargo fmt` to test code in `loop_.rs`
- Add `#[serde(default)]` to `data_retention`, `cloud_ops`,
  `conversational_ai`, `security`, and `security_ops` fields in
  `Config` so partial TOML files deserialize without error

---------

Co-authored-by: SimianAstronaut7 <79373020+SimianAstronaut7@users.noreply.github.com>

* fix(multimodal): gracefully skip unresolvable images instead of failing (#4316)

* fix(multimodal): gracefully skip unresolvable images instead of failing

Previously a single image load failure would fail the entire message.
Now skips bad images with a warning log and appends a note to the user
indicating how many images could not be loaded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(multimodal): update tests to match graceful image skip behavior

Tests now expect successful results with skip notes instead of errors
for oversized or unreachable images.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(multimodal): track has_successful_images and update test assertions

- Only set contains_images=true w…
DCPRevere pushed a commit to DCPRevere/opsclaw that referenced this pull request Apr 8, 2026
…uning (zeroclaw-labs#4825)

* fix(agent): treat tool_use/tool_result as atomic groups in history pruning

The history pruner and emergency trim could sever tool_use/tool_result
pairs when trimming conversation history to fit the token budget. This
caused Anthropic 400 errors ("tool_use ids were found without
tool_result blocks") during tool-heavy conversations that exceeded
max_context_tokens.

Phase 1 (collapse) now handles multi-tool groups: an assistant message
followed by N consecutive tool messages is collapsed into a single
summary, not one-at-a-time. Phase 2 (drop) drops assistant+tool
groups atomically instead of message-by-message.

emergency_history_trim gets the same atomic-group awareness.

Workaround: increasing agent.max_context_tokens to match the model's
actual context window (e.g., 200000 for Claude Sonnet) prevents the
pruner from firing in most cases.

Closes zeroclaw-labs#4810

* fix(lint): apply cargo fmt
whtiehack pushed a commit to whtiehack/zeroclaw that referenced this pull request Apr 13, 2026
…uning (zeroclaw-labs#4825)

Cherry-pick upstream 87698ad with conflict resolution:
- Phase 2 budget enforcement now drops assistant+tool groups atomically
- Keeps local mod tests; pattern (separate test file)
- Updates existing test assertion for new collapse summary format
- Adds 5 new tests for atomic group behavior
bedeabza pushed a commit to YAROOMS/zeroclaw that referenced this pull request Apr 21, 2026
…uning (zeroclaw-labs#4825)

* fix(agent): treat tool_use/tool_result as atomic groups in history pruning

The history pruner and emergency trim could sever tool_use/tool_result
pairs when trimming conversation history to fit the token budget. This
caused Anthropic 400 errors ("tool_use ids were found without
tool_result blocks") during tool-heavy conversations that exceeded
max_context_tokens.

Phase 1 (collapse) now handles multi-tool groups: an assistant message
followed by N consecutive tool messages is collapsed into a single
summary, not one-at-a-time. Phase 2 (drop) drops assistant+tool
groups atomically instead of message-by-message.

emergency_history_trim gets the same atomic-group awareness.

Workaround: increasing agent.max_context_tokens to match the model's
actual context window (e.g., 200000 for Claude Sonnet) prevents the
pruner from firing in most cases.

Closes zeroclaw-labs#4810

* fix(lint): apply cargo fmt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Auto scope: src/agent/** changed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: History pruner severs tool_use/tool_result pairs, causing Anthropic 400 error

1 participant