Skip to content

feat(agent): HMAC tool execution receipts for hallucination detection#4943

Closed
singlerider wants to merge 3 commits intozeroclaw-labs:masterfrom
singlerider:feat/tool-receipts-v2
Closed

feat(agent): HMAC tool execution receipts for hallucination detection#4943
singlerider wants to merge 3 commits intozeroclaw-labs:masterfrom
singlerider:feat/tool-receipts-v2

Conversation

@singlerider
Copy link
Copy Markdown
Collaborator

Summary

  • Base branch: master
  • Problem: LLMs hallucinate tool usage — they claim to have executed tools when they haven't. There is no cryptographic proof that a tool actually ran.
  • What changed: Adds HMAC-SHA256 tool execution receipts. Every tool execution produces a signed receipt (zc-receipt-...) that proves the tool ran with specific arguments and produced a specific output. Receipts are unforgeable by the LLM. Optionally appended to user-visible responses.
  • Scope: src/agent/tool_receipts.rs (new), src/agent/tool_execution.rs, src/agent/loop_.rs, src/channels/mod.rs, src/config/schema.rs, src/security/leak_detector.rs, docs/security/tool-receipts.md (new).

Note: Clean reimplementation against current master. Previous PR #4831 was closed by @SimianAstronaut7 claiming "Already merged upstream" — the code was not present in master. PR #4921 was the old branch with 61 conflicts. This is a fresh implementation. cc @joehoyle @theonlyhennygod — see #4657 for the broader pattern of PRs closed with false merge claims.

Label Snapshot

  • Risk: low
  • Size: S
  • Scope: agent, security
  • Module: agent, config

Change Metadata

  • Type: feature
  • Primary scope: src/agent/tool_receipts.rs, src/agent/tool_execution.rs

Linked Issue

Validation Evidence

cargo fmt --all -- --check   # pass
cargo check --lib            # pass

Security Impact

  • Receipts use HMAC-SHA256 with a per-session random key
  • Key is never exposed to the LLM or logged
  • Receipts prove tool execution but do not leak tool output

Privacy and Data Hygiene

  • No personal data in test fixtures
  • Neutral wording

Compatibility/Migration

  • Backward compatible — receipts are disabled by default
  • Enable with tool_receipts.enabled = true in config

Human Verification

  • Tool execution produces valid receipt
  • Receipt verification succeeds for genuine receipts
  • Fabricated receipts fail verification
  • show_in_response = true appends receipts to channel message

Side Effects/Blast Radius

  • New ToolReceiptsConfig in agent config
  • Receipt collector added to channel message processing
  • Leak detector updated to not redact zc-receipt- tokens

Rollback Plan

Revert the commit — removes receipt module and config.

Risks and Mitigations

None — opt-in feature, disabled by default.

@singlerider singlerider mentioned this pull request Mar 28, 2026
20 tasks
@github-actions github-actions Bot added docs Auto scope: docs/markdown/template files changed. core Auto scope: root src/*.rs files changed. agent Auto scope: src/agent/** changed. channel Auto scope: src/channels/** changed. config Auto scope: src/config/** changed. onboard Auto scope: src/onboard/** changed. provider Auto scope: src/providers/** changed. security Auto scope: src/security/** changed. skills Auto scope: src/skills/** changed. tool Auto scope: src/tools/** changed. tests Auto scope: tests/** changed. provider:openai Auto module: provider/openai changed. provider:compatible Auto module: provider/compatible changed. provider:bedrock Auto module: provider/bedrock changed. channel:matrix Auto module: channel/matrix changed. channel:lark Auto module: channel/lark changed. tool:shell tool:web provider:claude-code labels Mar 29, 2026
Every tool execution produces a cryptographic HMAC-SHA256 receipt proving
the tool actually ran. The LLM cannot forge valid receipts because it
never sees the ephemeral session key.

New module src/agent/tool_receipts.rs with ReceiptGenerator. Wired
through tool_execution.rs, loop_.rs, and channels/mod.rs. Opt-in via
config: agent.tool_receipts.enabled and agent.tool_receipts.show_in_response.

Leak detector updated to exempt zc-receipt- tokens from entropy redaction.

Closes zeroclaw-labs#4830
Supersedes zeroclaw-labs#4831, zeroclaw-labs#4921
@singlerider singlerider force-pushed the feat/tool-receipts-v2 branch from 687fc2c to 439e772 Compare March 29, 2026 10:12
@github-actions github-actions Bot removed core Auto scope: root src/*.rs files changed. onboard Auto scope: src/onboard/** changed. provider Auto scope: src/providers/** changed. skills Auto scope: src/skills/** changed. tests Auto scope: tests/** changed. provider:openai Auto module: provider/openai changed. provider:compatible Auto module: provider/compatible changed. provider:bedrock Auto module: provider/bedrock changed. labels Mar 29, 2026
…nt fixes

Add missing receipt_generator and collected_receipts args to 24 test
call sites, remove stray bool args from 2 cost-tracking tests, fix
stale struct fields in channel test constructions, and resolve clippy
warnings (collapsible_if, match_same_arms, type_complexity,
needless_lifetimes, struct_excessive_bools, unused variable).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added core Auto scope: root src/*.rs files changed. memory Auto scope: src/memory/** changed. tool:web labels Mar 29, 2026
The cherry-picked zeroclaw-labs#4927 changed handle_webhook to use
run_gateway_chat_with_tools, which calls process_message and
bootstraps a full agent from Config — ignoring the pre-configured
state.provider. This breaks when Config::default() lacks an API key.

Revert to run_gateway_chat_simple which correctly uses state.provider.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the gateway Auto scope: src/gateway/** changed. label Mar 29, 2026
@singlerider
Copy link
Copy Markdown
Collaborator Author

Closing in favor of #5168

@singlerider singlerider closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Auto scope: src/agent/** changed. channel:lark Auto module: channel/lark changed. channel Auto scope: src/channels/** changed. config Auto scope: src/config/** changed. core Auto scope: root src/*.rs files changed. docs Auto scope: docs/markdown/template files changed. gateway Auto scope: src/gateway/** changed. memory Auto scope: src/memory/** changed. security Auto scope: src/security/** changed. tool:web tool Auto scope: src/tools/** changed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: HMAC tool execution receipts for hallucination detection

1 participant