feat(agent): HMAC tool execution receipts for hallucination detection#4921
Closed
singlerider wants to merge 8 commits intozeroclaw-labs:masterfrom
Closed
feat(agent): HMAC tool execution receipts for hallucination detection#4921singlerider wants to merge 8 commits intozeroclaw-labs:masterfrom
singlerider wants to merge 8 commits intozeroclaw-labs:masterfrom
Conversation
…n detection When enabled via agent.tool_receipts.enabled, every tool execution produces a cryptographic HMAC-SHA256 receipt appended to the tool result. The LLM cannot forge valid receipts because the ephemeral session key is never exposed. Receipts create an independent ground truth about which tools actually executed. Adds ReceiptGenerator with per-session ephemeral keys, config flag (disabled by default), system prompt instruction when enabled, and 12 unit tests including adversarial verification (tampered results, wrong keys, fabricated receipts). Based on: Basu (2026), "Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents," arXiv:2603.10060.
When agent.tool_receipts.show_in_response is true, append collected tool receipts to the user-visible response message. Uses the universal delivered_response finalization point — works across all channels and streaming modes without per-channel implementation. Receipts are collected via a shared Mutex<Vec> passed to the tool loop, then appended after sanitization but before send. Default false — receipts remain internal/audit only unless opted in.
…action When agent.tool_receipts.show_in_response is true, append collected tool receipts to the user-visible response message. Uses the universal delivered_response finalization point — works across all channels and streaming modes without per-channel implementation. Exempt zc-receipt- tokens from the leak detector's high-entropy redaction so receipts are visible to the user when the LLM echoes them. Without this, the sanitizer replaces receipts with [REDACTED_HIGH_ENTROPY_TOKEN], making inline and appended receipts appear different for the same tool call. Refs zeroclaw-labs#4830
Add docs/security/tool-receipts.md covering receipt format, config options, security properties, what receipts detect vs don't prevent, how to view receipts in logs and responses, and Phase 1 limitations. Add tool_receipts section to config-reference.md with enabled and show_in_response options. Link from security README. Apply cargo fmt to leak_detector.rs.
Rust 2024 reserves `gen` as a keyword. Rename all test variables and closure parameters to `receipt_gen` in tool_receipts.rs and tool_execution.rs.
Merge origin/master into feat/tool-receipts, resolving conflicts in three files where the PR's HMAC receipt parameters and master's native_tool_calls_only / thinking_overrides additions were independently added to the same function signatures and call sites. Both sets of changes are kept. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
This is a Draft. DO NOT MERGE. |
singlerider
added a commit
to singlerider/zeroclaw
that referenced
this pull request
Mar 28, 2026
Every tool execution produces a cryptographic HMAC-SHA256 receipt proving the tool actually ran. The LLM cannot forge valid receipts because it never sees the ephemeral session key. New module src/agent/tool_receipts.rs with ReceiptGenerator. Wired through tool_execution.rs, loop_.rs, and channels/mod.rs. Opt-in via config: agent.tool_receipts.enabled and agent.tool_receipts.show_in_response. Leak detector updated to exempt zc-receipt- tokens from entropy redaction. Closes zeroclaw-labs#4830 Supersedes zeroclaw-labs#4831, zeroclaw-labs#4921
Collaborator
Author
|
Closing in favor of new clean reimplementation on |
6 tasks
singlerider
added a commit
to singlerider/zeroclaw
that referenced
this pull request
Mar 29, 2026
Every tool execution produces a cryptographic HMAC-SHA256 receipt proving the tool actually ran. The LLM cannot forge valid receipts because it never sees the ephemeral session key. New module src/agent/tool_receipts.rs with ReceiptGenerator. Wired through tool_execution.rs, loop_.rs, and channels/mod.rs. Opt-in via config: agent.tool_receipts.enabled and agent.tool_receipts.show_in_response. Leak detector updated to exempt zc-receipt- tokens from entropy redaction. Closes zeroclaw-labs#4830 Supersedes zeroclaw-labs#4831, zeroclaw-labs#4921
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
masterzc-receipt-...) that proves the tool ran with specific arguments and produced a specific output. Receipts are unforgeable by the LLM. Optionally appended to user-visible responses.src/agent/tool_receipts.rs(new),src/agent/tool_execution.rs,src/agent/loop_.rs,src/channels/mod.rs,src/config/schema.rs,docs/security/tool-receipts.md(new).Label Snapshot
Change Metadata
src/agent/tool_receipts.rs,src/agent/tool_execution.rsLinked Issue
Validation Evidence
Security Impact
Privacy and Data Hygiene
Compatibility/Migration
tool_receipts.enabled = truein configHuman Verification
show_in_response = trueappends receipts to channel messageSide Effects/Blast Radius
ToolReceiptsConfigin agent configzc-receipt-tokensRollback Plan
Revert the commit — removes receipt module and config.
Risks and Mitigations
None — opt-in feature, disabled by default.