feat(agent): HMAC tool execution receipts for hallucination detection#4831
Closed
singlerider wants to merge 7 commits intozeroclaw-labs:masterfrom
Closed
feat(agent): HMAC tool execution receipts for hallucination detection#4831singlerider wants to merge 7 commits intozeroclaw-labs:masterfrom
singlerider wants to merge 7 commits intozeroclaw-labs:masterfrom
Conversation
2 tasks
Collaborator
Author
|
I'm not 100% sure if anyone else would find something like this useful like I would. It's boring to constantly see the bots hallucinate. Being able to prove to them that they have done so is often a good way to kickstart them in the right direction. It feels in-the-spirit with the audit-first culture of this repo. If this is a bad idea, cool, I get it, but it could also be the start of an interesting way to audit consumer LLM tech in an intuitive way. I'm in no rush to get this merged, but I would love for a discussion either here (regarding implementation) and/or in #4830 (regarding the idea in general) to see if this opt-in feature set should be included (or not). |
37375fe to
8fc5225
Compare
…n detection When enabled via agent.tool_receipts.enabled, every tool execution produces a cryptographic HMAC-SHA256 receipt appended to the tool result. The LLM cannot forge valid receipts because the ephemeral session key is never exposed. Receipts create an independent ground truth about which tools actually executed. Adds ReceiptGenerator with per-session ephemeral keys, config flag (disabled by default), system prompt instruction when enabled, and 12 unit tests including adversarial verification (tampered results, wrong keys, fabricated receipts). Based on: Basu (2026), "Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents," arXiv:2603.10060.
When agent.tool_receipts.show_in_response is true, append collected tool receipts to the user-visible response message. Uses the universal delivered_response finalization point — works across all channels and streaming modes without per-channel implementation. Receipts are collected via a shared Mutex<Vec> passed to the tool loop, then appended after sanitization but before send. Default false — receipts remain internal/audit only unless opted in.
…action When agent.tool_receipts.show_in_response is true, append collected tool receipts to the user-visible response message. Uses the universal delivered_response finalization point — works across all channels and streaming modes without per-channel implementation. Exempt zc-receipt- tokens from the leak detector's high-entropy redaction so receipts are visible to the user when the LLM echoes them. Without this, the sanitizer replaces receipts with [REDACTED_HIGH_ENTROPY_TOKEN], making inline and appended receipts appear different for the same tool call. Refs zeroclaw-labs#4830
Add docs/security/tool-receipts.md covering receipt format, config options, security properties, what receipts detect vs don't prevent, how to view receipts in logs and responses, and Phase 1 limitations. Add tool_receipts section to config-reference.md with enabled and show_in_response options. Link from security README. Apply cargo fmt to leak_detector.rs.
Rust 2024 reserves `gen` as a keyword. Rename all test variables and closure parameters to `receipt_gen` in tool_receipts.rs and tool_execution.rs.
8fc5225 to
59df5db
Compare
This was referenced Mar 28, 2026
singlerider
added a commit
to singlerider/zeroclaw
that referenced
this pull request
Mar 28, 2026
Every tool execution produces a cryptographic HMAC-SHA256 receipt proving the tool actually ran. The LLM cannot forge valid receipts because it never sees the ephemeral session key. New module src/agent/tool_receipts.rs with ReceiptGenerator. Wired through tool_execution.rs, loop_.rs, and channels/mod.rs. Opt-in via config: agent.tool_receipts.enabled and agent.tool_receipts.show_in_response. Leak detector updated to exempt zc-receipt- tokens from entropy redaction. Closes zeroclaw-labs#4830 Supersedes zeroclaw-labs#4831, zeroclaw-labs#4921
6 tasks
singlerider
added a commit
to singlerider/zeroclaw
that referenced
this pull request
Mar 29, 2026
Every tool execution produces a cryptographic HMAC-SHA256 receipt proving the tool actually ran. The LLM cannot forge valid receipts because it never sees the ephemeral session key. New module src/agent/tool_receipts.rs with ReceiptGenerator. Wired through tool_execution.rs, loop_.rs, and channels/mod.rs. Opt-in via config: agent.tool_receipts.enabled and agent.tool_receipts.show_in_response. Leak detector updated to exempt zc-receipt- tokens from entropy redaction. Closes zeroclaw-labs#4830 Supersedes zeroclaw-labs#4831, zeroclaw-labs#4921
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
masterLabel Snapshot (required)
risk: lowsize: Magent, security, configagent: tool_execution,security: leak_detectorChange Metadata
featureagentLinked Issue
Validation Evidence (required)
Security Impact (required)
ring::SystemRandom, stored only inReceiptGeneratorstruct in memory. Never serialized, never sent to the LLM, never logged. Destroyed when the session ends.Privacy and Data Hygiene (required)
passscrub_credentials). The HMAC itself reveals no information about the key.Compatibility / Migration
[agent.tool_receipts]section withenabledandshow_in_responsei18n Follow-Through (required when docs or user-facing wording changes)
docs/security/tool-receipts.md, config reference update)Human Verification (required)
What was personally validated beyond CI:
show_in_responsefooter. Leak detector exemption. Bot echoes real receipts same-turn. Bot fabricates ULID-format fake when receipts not wired up.Verified test exchanges (reconstructed from production, sanitized)
Test 1 — same-turn echo works:
Debug log:
Tool receipt generated tool=shell receipt=zc-receipt-1774604899-fVRG...✅Test 2 — fabrication before wiring:
No
zc-receipt-prefix. Bot fabricated a ULID. Immediately distinguishable. ❌ (expected — receipts not configured)Test 3 — multi-tool with
show_in_response:Debug log: Two
Tool receipt generatedevents with matching hashes. ✅Test 4 — cross-turn recall failure (Phase 1 limitation):
Debug log: Bot re-executed weather twice (timestamps
1774609791vs originals1774609588). Both sets valid — different invocations. ❌ (known limitation)Associated debug log pattern:
Side Effects / Blast Radius (required)
show_in_responseis false — the system prompt encourages it).Tool receipt generateddebug log. Receipt appears in tool result content.Agent Collaboration Notes (recommended)
Rollback Plan (required)
agent.tool_receipts.enabled = falseor remove section from configenabled = false(default),show_in_response = false(default)Risks and Mitigations
show_in_responseruntime footer is the reliable verification path. Phase 2 adds persistent audit table.Zero new dependencies
All cryptographic primitives already in the dependency tree:
hmac(v0.12),sha2,ring(forSecureRandom),base64. No binary size increase beyond ~200 lines of receipt logic.References
🤖 Generated with Claude Code