feat(agent): HMAC tool execution receipts for hallucination detection by singlerider · Pull Request #4921 · zeroclaw-labs/zeroclaw

singlerider · 2026-03-28T22:36:12Z

Summary

Base branch: master
Problem: LLMs hallucinate tool usage — they claim to have executed tools when they haven't. There is no cryptographic proof that a tool actually ran.
What changed: Adds HMAC-SHA256 tool execution receipts. Every tool execution produces a signed receipt (zc-receipt-...) that proves the tool ran with specific arguments and produced a specific output. Receipts are unforgeable by the LLM. Optionally appended to user-visible responses.
Scope: src/agent/tool_receipts.rs (new), src/agent/tool_execution.rs, src/agent/loop_.rs, src/channels/mod.rs, src/config/schema.rs, docs/security/tool-receipts.md (new).

Note: This is a resubmission. The previous PR #4831 was closed by @SimianAstronaut7 with the comment "Already merged upstream" — however the code is not present in master as of 50425247. No src/agent/tool_receipts.rs, no ToolReceiptsConfig, and no receipt generation code exist in the tree. cc @joehoyle @theonlyhennygod — this happened to multiple PRs from multiple contributors; see #4657 for details.

Label Snapshot

Risk: low
Size: S
Scope: agent, security
Module: agent, config

Change Metadata

Type: feature
Primary scope: src/agent/tool_receipts.rs, src/agent/tool_execution.rs

Linked Issue

Closes: [Feature]: HMAC tool execution receipts for hallucination detection #4830
Related: Matrix channel: friction tracker #4657 (Matrix friction tracker)
Supersedes: feat(agent): HMAC tool execution receipts for hallucination detection #4831 (closed without merging)

Validation Evidence

cargo fmt --all -- --check   # pass
cargo clippy --all-targets -- -D warnings   # pass
cargo test   # pass

Security Impact

Receipts use HMAC-SHA256 with a per-session random key
Key is never exposed to the LLM or logged
Receipts prove tool execution but do not leak tool output

Privacy and Data Hygiene

No personal data in test fixtures
Neutral wording

Compatibility/Migration

Backward compatible — receipts are disabled by default
Enable with tool_receipts.enabled = true in config

Human Verification

Tool execution produces valid receipt
Receipt verification succeeds for genuine receipts
Fabricated receipts fail verification
show_in_response = true appends receipts to channel message
Live-tested on Matrix and Discord

Side Effects/Blast Radius

New ToolReceiptsConfig in agent config
Receipt collector added to channel message processing
Leak detector updated to not redact zc-receipt- tokens

Rollback Plan

Revert the commit — removes receipt module and config.

Risks and Mitigations

None — opt-in feature, disabled by default.

…n detection When enabled via agent.tool_receipts.enabled, every tool execution produces a cryptographic HMAC-SHA256 receipt appended to the tool result. The LLM cannot forge valid receipts because the ephemeral session key is never exposed. Receipts create an independent ground truth about which tools actually executed. Adds ReceiptGenerator with per-session ephemeral keys, config flag (disabled by default), system prompt instruction when enabled, and 12 unit tests including adversarial verification (tampered results, wrong keys, fabricated receipts). Based on: Basu (2026), "Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents," arXiv:2603.10060.

When agent.tool_receipts.show_in_response is true, append collected tool receipts to the user-visible response message. Uses the universal delivered_response finalization point — works across all channels and streaming modes without per-channel implementation. Receipts are collected via a shared Mutex<Vec> passed to the tool loop, then appended after sanitization but before send. Default false — receipts remain internal/audit only unless opted in.

…action When agent.tool_receipts.show_in_response is true, append collected tool receipts to the user-visible response message. Uses the universal delivered_response finalization point — works across all channels and streaming modes without per-channel implementation. Exempt zc-receipt- tokens from the leak detector's high-entropy redaction so receipts are visible to the user when the LLM echoes them. Without this, the sanitizer replaces receipts with [REDACTED_HIGH_ENTROPY_TOKEN], making inline and appended receipts appear different for the same tool call. Refs zeroclaw-labs#4830

Add docs/security/tool-receipts.md covering receipt format, config options, security properties, what receipts detect vs don't prevent, how to view receipts in logs and responses, and Phase 1 limitations. Add tool_receipts section to config-reference.md with enabled and show_in_response options. Link from security README. Apply cargo fmt to leak_detector.rs.

Rust 2024 reserves `gen` as a keyword. Rename all test variables and closure parameters to `receipt_gen` in tool_receipts.rs and tool_execution.rs.

Merge origin/master into feat/tool-receipts, resolving conflicts in three files where the PR's HMAC receipt parameters and master's native_tool_calls_only / thinking_overrides additions were independently added to the same function signatures and call sites. Both sets of changes are kept. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

singlerider · 2026-03-28T23:30:28Z

This is a Draft. DO NOT MERGE.

Every tool execution produces a cryptographic HMAC-SHA256 receipt proving the tool actually ran. The LLM cannot forge valid receipts because it never sees the ephemeral session key. New module src/agent/tool_receipts.rs with ReceiptGenerator. Wired through tool_execution.rs, loop_.rs, and channels/mod.rs. Opt-in via config: agent.tool_receipts.enabled and agent.tool_receipts.show_in_response. Leak detector updated to exempt zc-receipt- tokens from entropy redaction. Closes zeroclaw-labs#4830 Supersedes zeroclaw-labs#4831, zeroclaw-labs#4921

singlerider · 2026-03-28T23:52:29Z

Closing in favor of new clean reimplementation on feat/tool-receipts-v2 branch.

Every tool execution produces a cryptographic HMAC-SHA256 receipt proving the tool actually ran. The LLM cannot forge valid receipts because it never sees the ephemeral session key. New module src/agent/tool_receipts.rs with ReceiptGenerator. Wired through tool_execution.rs, loop_.rs, and channels/mod.rs. Opt-in via config: agent.tool_receipts.enabled and agent.tool_receipts.show_in_response. Leak detector updated to exempt zc-receipt- tokens from entropy redaction. Closes zeroclaw-labs#4830 Supersedes zeroclaw-labs#4831, zeroclaw-labs#4921

singlerider added 7 commits March 28, 2026 09:20

fix(agent): add debug log for tool receipt generation

dc58799

fix: rename reserved keyword gen to receipt_gen for Rust 2024 edition

59df5db

Rust 2024 reserves `gen` as a keyword. Rename all test variables and closure parameters to `receipt_gen` in tool_receipts.rs and tool_execution.rs.

fix(lint): apply cargo fmt to tool_execution.rs

eef1314

singlerider marked this pull request as draft March 28, 2026 23:34

singlerider closed this Mar 28, 2026

singlerider mentioned this pull request Mar 28, 2026

feat(agent): HMAC tool execution receipts for hallucination detection #4943

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): HMAC tool execution receipts for hallucination detection#4921

feat(agent): HMAC tool execution receipts for hallucination detection#4921
singlerider wants to merge 8 commits intozeroclaw-labs:masterfrom
singlerider:feat/tool-receipts

singlerider commented Mar 28, 2026

Uh oh!

singlerider commented Mar 28, 2026

Uh oh!

singlerider commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

singlerider commented Mar 28, 2026

Summary

Label Snapshot

Change Metadata

Linked Issue

Validation Evidence

Security Impact

Privacy and Data Hygiene

Compatibility/Migration

Human Verification

Side Effects/Blast Radius

Rollback Plan

Risks and Mitigations

Uh oh!

singlerider commented Mar 28, 2026

Uh oh!

singlerider commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant