Skip to content

Commit 687fc2c

Browse files
committed
feat(agent): HMAC tool execution receipts for hallucination detection
Every tool execution produces a cryptographic HMAC-SHA256 receipt proving the tool actually ran. The LLM cannot forge valid receipts because it never sees the ephemeral session key. New module src/agent/tool_receipts.rs with ReceiptGenerator. Wired through tool_execution.rs, loop_.rs, and channels/mod.rs. Opt-in via config: agent.tool_receipts.enabled and agent.tool_receipts.show_in_response. Leak detector updated to exempt zc-receipt- tokens from entropy redaction. Closes #4830 Supersedes #4831, #4921
1 parent be74724 commit 687fc2c

32 files changed

Lines changed: 685 additions & 163 deletions

docs/security/tool-receipts.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Tool Execution Receipts
2+
3+
HMAC-SHA256 tool execution receipts for hallucination detection.
4+
5+
## Overview
6+
7+
When enabled, every tool execution produces a cryptographic receipt that proves
8+
the tool actually ran. The LLM cannot forge valid receipts because it never
9+
sees the ephemeral session key.
10+
11+
Based on: Basu, A. (2026). "Tool Receipts, Not Zero-Knowledge Proofs:
12+
Practical Hallucination Detection for AI Agents." arXiv:2603.10060
13+
14+
## How it works
15+
16+
1. At session start, an ephemeral 256-bit key is generated via the system CSPRNG.
17+
2. When a tool executes successfully, ZeroClaw computes an HMAC-SHA256 over:
18+
- Tool name
19+
- Serialized arguments (JSON)
20+
- Tool output (after credential scrubbing)
21+
- Current Unix timestamp
22+
3. The receipt is formatted as `zc-receipt-{timestamp}-{base64url_hash}` and
23+
appended to the tool result seen by the LLM.
24+
4. The LLM is instructed to include receipts verbatim when referencing tool
25+
results; a missing or invalid receipt indicates a fabricated tool call.
26+
27+
## Configuration
28+
29+
Add to your `zeroclaw.toml`:
30+
31+
```toml
32+
[agent.tool_receipts]
33+
enabled = true # Generate HMAC receipts for tool executions
34+
show_in_response = false # Append receipts to user-visible responses
35+
```
36+
37+
### Fields
38+
39+
| Field | Type | Default | Description |
40+
|--------------------|------|---------|--------------------------------------------------|
41+
| `enabled` | bool | false | Enable HMAC receipt generation |
42+
| `show_in_response` | bool | false | Append receipts to the delivered channel message |
43+
44+
## Security properties
45+
46+
- **Unforgeability**: The LLM never sees the ephemeral key, so it cannot
47+
produce a valid receipt for a tool call it did not make.
48+
- **Ephemeral keys**: A new key is generated each session. Compromising one
49+
session's key does not affect others.
50+
- **Non-interference**: Receipts are stripped by the leak detector's entropy
51+
scanner so they are never redacted as leaked credentials.
52+
53+
## Verification
54+
55+
Receipts can be verified programmatically using the `ReceiptGenerator::verify`
56+
method with the same ephemeral key. This is useful for audit logging and
57+
automated hallucination detection pipelines.
58+
59+
## Limitations
60+
61+
- Receipts prove execution happened, not that the output is semantically correct.
62+
- The ephemeral key exists only in memory; it is lost on process restart.
63+
- Receipt verification requires access to the session's ephemeral key.

src/agent/agent.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1336,8 +1336,8 @@ pub async fn run(
13361336
mod tests {
13371337
use super::*;
13381338
use async_trait::async_trait;
1339-
use futures_util::stream;
13401339
use futures_util::StreamExt;
1340+
use futures_util::stream;
13411341
use parking_lot::Mutex;
13421342
use std::collections::HashMap;
13431343

0 commit comments

Comments
 (0)