Skip to content

feat(agent): HMAC tool execution receipts for hallucination detection#4921

Closed
singlerider wants to merge 8 commits intozeroclaw-labs:masterfrom
singlerider:feat/tool-receipts
Closed

feat(agent): HMAC tool execution receipts for hallucination detection#4921
singlerider wants to merge 8 commits intozeroclaw-labs:masterfrom
singlerider:feat/tool-receipts

Conversation

@singlerider
Copy link
Copy Markdown
Collaborator

Summary

  • Base branch: master
  • Problem: LLMs hallucinate tool usage — they claim to have executed tools when they haven't. There is no cryptographic proof that a tool actually ran.
  • What changed: Adds HMAC-SHA256 tool execution receipts. Every tool execution produces a signed receipt (zc-receipt-...) that proves the tool ran with specific arguments and produced a specific output. Receipts are unforgeable by the LLM. Optionally appended to user-visible responses.
  • Scope: src/agent/tool_receipts.rs (new), src/agent/tool_execution.rs, src/agent/loop_.rs, src/channels/mod.rs, src/config/schema.rs, docs/security/tool-receipts.md (new).

Note: This is a resubmission. The previous PR #4831 was closed by @SimianAstronaut7 with the comment "Already merged upstream" — however the code is not present in master as of 50425247. No src/agent/tool_receipts.rs, no ToolReceiptsConfig, and no receipt generation code exist in the tree. cc @joehoyle @theonlyhennygod — this happened to multiple PRs from multiple contributors; see #4657 for details.

Label Snapshot

  • Risk: low
  • Size: S
  • Scope: agent, security
  • Module: agent, config

Change Metadata

  • Type: feature
  • Primary scope: src/agent/tool_receipts.rs, src/agent/tool_execution.rs

Linked Issue

Validation Evidence

cargo fmt --all -- --check   # pass
cargo clippy --all-targets -- -D warnings   # pass
cargo test   # pass

Security Impact

  • Receipts use HMAC-SHA256 with a per-session random key
  • Key is never exposed to the LLM or logged
  • Receipts prove tool execution but do not leak tool output

Privacy and Data Hygiene

  • No personal data in test fixtures
  • Neutral wording

Compatibility/Migration

  • Backward compatible — receipts are disabled by default
  • Enable with tool_receipts.enabled = true in config

Human Verification

  • Tool execution produces valid receipt
  • Receipt verification succeeds for genuine receipts
  • Fabricated receipts fail verification
  • show_in_response = true appends receipts to channel message
  • Live-tested on Matrix and Discord

Side Effects/Blast Radius

  • New ToolReceiptsConfig in agent config
  • Receipt collector added to channel message processing
  • Leak detector updated to not redact zc-receipt- tokens

Rollback Plan

Revert the commit — removes receipt module and config.

Risks and Mitigations

None — opt-in feature, disabled by default.

…n detection

When enabled via agent.tool_receipts.enabled, every tool execution
produces a cryptographic HMAC-SHA256 receipt appended to the tool
result. The LLM cannot forge valid receipts because the ephemeral
session key is never exposed. Receipts create an independent ground
truth about which tools actually executed.

Adds ReceiptGenerator with per-session ephemeral keys, config flag
(disabled by default), system prompt instruction when enabled, and
12 unit tests including adversarial verification (tampered results,
wrong keys, fabricated receipts).

Based on: Basu (2026), "Tool Receipts, Not Zero-Knowledge Proofs:
Practical Hallucination Detection for AI Agents," arXiv:2603.10060.
When agent.tool_receipts.show_in_response is true, append collected
tool receipts to the user-visible response message. Uses the
universal delivered_response finalization point — works across all
channels and streaming modes without per-channel implementation.

Receipts are collected via a shared Mutex<Vec> passed to the tool
loop, then appended after sanitization but before send. Default
false — receipts remain internal/audit only unless opted in.
…action

When agent.tool_receipts.show_in_response is true, append collected
tool receipts to the user-visible response message. Uses the
universal delivered_response finalization point — works across all
channels and streaming modes without per-channel implementation.

Exempt zc-receipt- tokens from the leak detector's high-entropy
redaction so receipts are visible to the user when the LLM echoes
them. Without this, the sanitizer replaces receipts with
[REDACTED_HIGH_ENTROPY_TOKEN], making inline and appended receipts
appear different for the same tool call.

Refs zeroclaw-labs#4830
Add docs/security/tool-receipts.md covering receipt format, config
options, security properties, what receipts detect vs don't prevent,
how to view receipts in logs and responses, and Phase 1 limitations.

Add tool_receipts section to config-reference.md with enabled and
show_in_response options. Link from security README.

Apply cargo fmt to leak_detector.rs.
Rust 2024 reserves `gen` as a keyword. Rename all test variables
and closure parameters to `receipt_gen` in tool_receipts.rs and
tool_execution.rs.
@github-actions github-actions Bot added docs Auto scope: docs/markdown/template files changed. agent Auto scope: src/agent/** changed. channel Auto scope: src/channels/** changed. config Auto scope: src/config/** changed. security Auto scope: src/security/** changed. tool Auto scope: src/tools/** changed. labels Mar 28, 2026
Merge origin/master into feat/tool-receipts, resolving conflicts in
three files where the PR's HMAC receipt parameters and master's
native_tool_calls_only / thinking_overrides additions were independently
added to the same function signatures and call sites. Both sets of
changes are kept.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@singlerider
Copy link
Copy Markdown
Collaborator Author

This is a Draft. DO NOT MERGE.

@singlerider singlerider marked this pull request as draft March 28, 2026 23:34
singlerider added a commit to singlerider/zeroclaw that referenced this pull request Mar 28, 2026
Every tool execution produces a cryptographic HMAC-SHA256 receipt proving
the tool actually ran. The LLM cannot forge valid receipts because it
never sees the ephemeral session key.

New module src/agent/tool_receipts.rs with ReceiptGenerator. Wired
through tool_execution.rs, loop_.rs, and channels/mod.rs. Opt-in via
config: agent.tool_receipts.enabled and agent.tool_receipts.show_in_response.

Leak detector updated to exempt zc-receipt- tokens from entropy redaction.

Closes zeroclaw-labs#4830
Supersedes zeroclaw-labs#4831, zeroclaw-labs#4921
@singlerider
Copy link
Copy Markdown
Collaborator Author

Closing in favor of new clean reimplementation on feat/tool-receipts-v2 branch.

singlerider added a commit to singlerider/zeroclaw that referenced this pull request Mar 29, 2026
Every tool execution produces a cryptographic HMAC-SHA256 receipt proving
the tool actually ran. The LLM cannot forge valid receipts because it
never sees the ephemeral session key.

New module src/agent/tool_receipts.rs with ReceiptGenerator. Wired
through tool_execution.rs, loop_.rs, and channels/mod.rs. Opt-in via
config: agent.tool_receipts.enabled and agent.tool_receipts.show_in_response.

Leak detector updated to exempt zc-receipt- tokens from entropy redaction.

Closes zeroclaw-labs#4830
Supersedes zeroclaw-labs#4831, zeroclaw-labs#4921
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Auto scope: src/agent/** changed. channel Auto scope: src/channels/** changed. config Auto scope: src/config/** changed. docs Auto scope: docs/markdown/template files changed. security Auto scope: src/security/** changed. tool Auto scope: src/tools/** changed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: HMAC tool execution receipts for hallucination detection

1 participant