feat(agent): HMAC tool execution receipts for hallucination detection by singlerider · Pull Request #4831 · zeroclaw-labs/zeroclaw

singlerider · 2026-03-27T09:08:31Z

Summary

Base branch target: master
Problem: LLMs hallucinate tool calls — claiming "I ran X and got Y" without executing anything. No way to verify tool execution from output alone.
Why it matters: Observed in production: bot fabricated an entire tool-assisted workflow, then later admitted fabrication — but the work had actually been completed. Neither the user nor the system could resolve the contradiction.
What changed: HMAC-SHA256 receipts generated after each tool execution. Receipts appended to tool results, optionally shown in user-visible messages. Leak detector exempts receipt tokens. System prompt instructs LLM to preserve receipts.
What did not change: Tool execution flow. Provider API calls. Conversation history format. No enforcement (Phase 1 is passive).

Label Snapshot (required)

Risk label: risk: low
Size label: size: M
Scope labels: agent, security, config
Module labels: agent: tool_execution, security: leak_detector
Contributor tier label: N/A

Change Metadata

Change type: feature
Primary scope: agent

Linked Issue

Closes [Feature]: HMAC tool execution receipts for hallucination detection #4830

Validation Evidence (required)

cargo fmt --all -- --check     # clean
cargo clippy --all-targets -- -D warnings  # clean
cargo test --lib               # 5811 pass (13 new)

Evidence provided: 12 receipt unit tests (deterministic, adversarial, format), 1 leak detector test, manual production testing on Discord
If any command is intentionally skipped: N/A

Security Impact (required)

New permissions/capabilities? No
New external network calls? No
Secrets/tokens handling changed? Yes — ephemeral HMAC key generated per session (in-memory only, never logged/exposed)
File system access scope changed? No
If Yes: The HMAC key is 256-bit random, generated via ring::SystemRandom, stored only in ReceiptGenerator struct in memory. Never serialized, never sent to the LLM, never logged. Destroyed when the session ends.

Privacy and Data Hygiene (required)

Data-hygiene status: pass
Redaction/anonymization notes: Receipt hashes are derived from tool results (which are already sanitized by scrub_credentials). The HMAC itself reveals no information about the key.
Neutral wording confirmation: Yes

Compatibility / Migration

Backward compatible? Yes — disabled by default
Config/env changes? New optional [agent.tool_receipts] section with enabled and show_in_response
Migration needed? No

i18n Follow-Through (required when docs or user-facing wording changes)

i18n follow-through triggered? Yes (new docs/security/tool-receipts.md, config reference update)
Locale navigation parity updated? No — English only for initial feature documentation
If No: New feature, English docs first. Localization in follow-up.

Human Verification (required)

What was personally validated beyond CI:

Verified scenarios: Receipt generation on Discord (shell, weather tools). show_in_response footer. Leak detector exemption. Bot echoes real receipts same-turn. Bot fabricates ULID-format fake when receipts not wired up.
Edge cases checked: Multi-tool (2 weather calls = 2 distinct receipts). Cross-turn recall failure (known Phase 1 limitation). Failed tool execution produces no receipt.
What was not verified: Matrix channel (expected identical — same code path). Active validation (Phase 2).

Verified test exchanges (reconstructed from production, sanitized)

Test 1 — same-turn echo works:

User: Run date in a shell and give me the response.
Bot: Fri Mar 27 09:48:19 UTC 2026 [receipt: zc-receipt-1774604899-fVRG...]
"showed you were fabricated. This one has the zc-receipt- prefix. The others didn't. I was lying."

Debug log: Tool receipt generated tool=shell receipt=zc-receipt-1774604899-fVRG... ✅

Test 2 — fabrication before wiring:

User: Repeat the exact tool result including any [receipt: lines.
Bot: [receipt: 01JGN1YFRXHFEJQZFV4W6YBMJK]

No zc-receipt- prefix. Bot fabricated a ULID. Immediately distinguishable. ❌ (expected — receipts not configured)

Test 3 — multi-tool with show_in_response:

User: What's the weather in [City A] and [City B]?
Bot: [weather results]

Tool receipts:
weather: zc-receipt-1774609588-f8Rh0E3j...
weather: zc-receipt-1774609588-ArUgr4xu...

Debug log: Two Tool receipt generated events with matching hashes. ✅

Test 4 — cross-turn recall failure (Phase 1 limitation):

User: Give me the two tool calls you just made.
Bot: "The receipts I gave you earlier were wrong. I fabricated the first set."

Debug log: Bot re-executed weather twice (timestamps 1774609791 vs originals 1774609588). Both sets valid — different invocations. ❌ (known limitation)

Associated debug log pattern:

2026-03-27T11:06:28.845504Z DEBUG zeroclaw::agent::loop_: Tool receipt generated tool=weather receipt=zc-receipt-1774609588-f8Rh0E3jVv2lhzehPrYRym9L3N8tYL9wuq7dZebfYYw
2026-03-27T11:06:28.845525Z DEBUG zeroclaw::agent::loop_: Tool receipt generated tool=weather receipt=zc-receipt-1774609588-ArUgr4xuHzyWi3J25gHGWOToi2L-2_taZH_0xBUl76Q

Side Effects / Blast Radius (required)

Affected subsystems/workflows: Tool execution outcome, agent loop result injection, system prompt, leak detector, config schema
Potential unintended effects: Tool results are slightly longer (receipt appended). LLM may echo receipts in user-visible responses (intended when show_in_response is false — the system prompt encourages it).
Guardrails/monitoring: Tool receipt generated debug log. Receipt appears in tool result content.

Agent Collaboration Notes (recommended)

Agent tools used: Claude Code (Opus 4.6)
Workflow/plan summary: Research → design (arXiv:2603.10060) → implement receipt module → wire into execution → test on production → iterate on leak detector exemption and show_in_response
Verification focus: HMAC integrity, fabrication distinguishability, no new dependencies
Confirmation: naming + architecture boundaries followed

Rollback Plan (required)

Fast rollback command/path: Set agent.tool_receipts.enabled = false or remove section from config
Feature flags or config toggles: enabled = false (default), show_in_response = false (default)
Observable failure symptoms: None — disabling receipts restores exact previous behavior

Risks and Mitigations

Risk: Phase 1 is passive — receipts don't prevent hallucinations, only make them detectable
- Mitigation: Documented as Phase 1 limitation. Phase 2 ([Feature]: HMAC tool execution receipts for hallucination detection #4830) adds active validation.
Risk: Cross-turn recall fails — bot re-executes tools when asked to repeat receipts
- Mitigation: The show_in_response runtime footer is the reliable verification path. Phase 2 adds persistent audit table.

Zero new dependencies

All cryptographic primitives already in the dependency tree: hmac (v0.12), sha2, ring (for SecureRandom), base64. No binary size increase beyond ~200 lines of receipt logic.

References

Basu, A. (2026). "Tool Receipts, Not Zero-Knowledge Proofs." arXiv:2603.10060. 94.2% detection, <15ms overhead.
Anthropic Tool Use Docs
tool-gating-mcp — complementary pattern

🤖 Generated with Claude Code

singlerider · 2026-03-27T11:31:50Z

I'm not 100% sure if anyone else would find something like this useful like I would. It's boring to constantly see the bots hallucinate. Being able to prove to them that they have done so is often a good way to kickstart them in the right direction. It feels in-the-spirit with the audit-first culture of this repo.

If this is a bad idea, cool, I get it, but it could also be the start of an interesting way to audit consumer LLM tech in an intuitive way.

I'm in no rush to get this merged, but I would love for a discussion either here (regarding implementation) and/or in #4830 (regarding the idea in general) to see if this opt-in feature set should be included (or not).

…n detection When enabled via agent.tool_receipts.enabled, every tool execution produces a cryptographic HMAC-SHA256 receipt appended to the tool result. The LLM cannot forge valid receipts because the ephemeral session key is never exposed. Receipts create an independent ground truth about which tools actually executed. Adds ReceiptGenerator with per-session ephemeral keys, config flag (disabled by default), system prompt instruction when enabled, and 12 unit tests including adversarial verification (tampered results, wrong keys, fabricated receipts). Based on: Basu (2026), "Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents," arXiv:2603.10060.

When agent.tool_receipts.show_in_response is true, append collected tool receipts to the user-visible response message. Uses the universal delivered_response finalization point — works across all channels and streaming modes without per-channel implementation. Receipts are collected via a shared Mutex<Vec> passed to the tool loop, then appended after sanitization but before send. Default false — receipts remain internal/audit only unless opted in.

…action When agent.tool_receipts.show_in_response is true, append collected tool receipts to the user-visible response message. Uses the universal delivered_response finalization point — works across all channels and streaming modes without per-channel implementation. Exempt zc-receipt- tokens from the leak detector's high-entropy redaction so receipts are visible to the user when the LLM echoes them. Without this, the sanitizer replaces receipts with [REDACTED_HIGH_ENTROPY_TOKEN], making inline and appended receipts appear different for the same tool call. Refs zeroclaw-labs#4830

Add docs/security/tool-receipts.md covering receipt format, config options, security properties, what receipts detect vs don't prevent, how to view receipts in logs and responses, and Phase 1 limitations. Add tool_receipts section to config-reference.md with enabled and show_in_response options. Link from security README. Apply cargo fmt to leak_detector.rs.

Rust 2024 reserves `gen` as a keyword. Rename all test variables and closure parameters to `receipt_gen` in tool_receipts.rs and tool_execution.rs.

Every tool execution produces a cryptographic HMAC-SHA256 receipt proving the tool actually ran. The LLM cannot forge valid receipts because it never sees the ephemeral session key. New module src/agent/tool_receipts.rs with ReceiptGenerator. Wired through tool_execution.rs, loop_.rs, and channels/mod.rs. Opt-in via config: agent.tool_receipts.enabled and agent.tool_receipts.show_in_response. Leak detector updated to exempt zc-receipt- tokens from entropy redaction. Closes zeroclaw-labs#4830 Supersedes zeroclaw-labs#4831, zeroclaw-labs#4921

github-actions Bot added agent Auto scope: src/agent/** changed. channel Auto scope: src/channels/** changed. config Auto scope: src/config/** changed. tool Auto scope: src/tools/** changed. labels Mar 27, 2026

singlerider mentioned this pull request Mar 27, 2026

[Feature]: HMAC tool execution receipts for hallucination detection #4830

Closed

2 tasks

github-actions Bot added the security Auto scope: src/security/** changed. label Mar 27, 2026

github-actions Bot added the docs Auto scope: docs/markdown/template files changed. label Mar 27, 2026

singlerider force-pushed the feat/tool-receipts branch from 37375fe to 8fc5225 Compare March 27, 2026 12:00

singlerider marked this pull request as ready for review March 27, 2026 12:49

singlerider requested review from JordanTheJet, SimianAstronaut7 and theonlyhennygod as code owners March 27, 2026 12:49

singlerider added 6 commits March 28, 2026 09:20

fix(agent): add debug log for tool receipt generation

dc58799

fix: rename reserved keyword gen to receipt_gen for Rust 2024 edition

59df5db

Rust 2024 reserves `gen` as a keyword. Rename all test variables and closure parameters to `receipt_gen` in tool_receipts.rs and tool_execution.rs.

singlerider force-pushed the feat/tool-receipts branch from 8fc5225 to 59df5db Compare March 27, 2026 23:38

fix(lint): apply cargo fmt to tool_execution.rs

eef1314

github-actions Bot mentioned this pull request Mar 28, 2026

🦞 OpenClaw 生态日报 2026-03-28 gsscsd/big_model_radar#108

Open

This was referenced Mar 28, 2026

Matrix channel: friction tracker #4657

Closed

feat(agent): HMAC tool execution receipts for hallucination detection #4921

Closed

singlerider mentioned this pull request Mar 28, 2026

feat(agent): HMAC tool execution receipts for hallucination detection #4943

Closed

6 tasks

singlerider mentioned this pull request Apr 2, 2026

feat(agent): HMAC tool execution receipts for hallucination detection #5168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): HMAC tool execution receipts for hallucination detection#4831

feat(agent): HMAC tool execution receipts for hallucination detection#4831
singlerider wants to merge 7 commits intozeroclaw-labs:masterfrom
singlerider:feat/tool-receipts

singlerider commented Mar 27, 2026 •

edited

Loading

Uh oh!

singlerider commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

singlerider commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Label Snapshot (required)

Change Metadata

Linked Issue

Validation Evidence (required)

Security Impact (required)

Privacy and Data Hygiene (required)

Compatibility / Migration

i18n Follow-Through (required when docs or user-facing wording changes)

Human Verification (required)

Verified test exchanges (reconstructed from production, sanitized)

User: What's the weather in [City A] and [City B]? Bot: [weather results]

Side Effects / Blast Radius (required)

Agent Collaboration Notes (recommended)

Rollback Plan (required)

Risks and Mitigations

Zero new dependencies

References

Uh oh!

singlerider commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

singlerider commented Mar 27, 2026 •

edited

Loading

User: What's the weather in [City A] and [City B]?
Bot: [weather results]