Skip to content

Commit 9305d13

Browse files
committed
docs(security): add tool receipts documentation
Add docs/security/tool-receipts.md covering receipt format, config options, security properties, what receipts detect vs don't prevent, how to view receipts in logs and responses, and Phase 1 limitations. Add tool_receipts section to config-reference.md with enabled and show_in_response options. Link from security README. Apply cargo fmt to leak_detector.rs.
1 parent 674cb13 commit 9305d13

4 files changed

Lines changed: 134 additions & 2 deletions

File tree

docs/reference/api/config-reference.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,21 @@ tools = ["mcp_browser_*"]
122122
keywords = ["browse", "navigate", "open url", "screenshot"]
123123
```
124124

125+
### `tool_receipts`
126+
127+
HMAC-SHA256 tool execution receipts for hallucination detection. When enabled, every successful tool execution produces a cryptographic receipt that proves the tool actually ran. See [tool-receipts.md](../../security/tool-receipts.md) for full documentation.
128+
129+
| Key | Default | Purpose |
130+
|---|---|---|
131+
| `enabled` | `false` | Generate HMAC receipts for tool executions |
132+
| `show_in_response` | `false` | Append receipts to user-visible channel messages |
133+
134+
```toml
135+
[agent.tool_receipts]
136+
enabled = true
137+
show_in_response = false
138+
```
139+
125140
## `[pacing]`
126141

127142
Pacing controls for slow/local LLM workloads (Ollama, llama.cpp, vLLM). All keys are optional; when absent, existing behavior is preserved.

docs/security/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,5 @@ The following docs are explicitly proposal-oriented and may include hypothetical
1919
- [sandboxing.md](sandboxing.md)
2020
- [../ops/resource-limits.md](../ops/resource-limits.md)
2121
- [audit-logging.md](audit-logging.md)
22+
- [tool-receipts.md](tool-receipts.md)
2223
- [security-roadmap.md](security-roadmap.md)

docs/security/tool-receipts.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# Tool Execution Receipts
2+
3+
## Overview
4+
5+
Tool receipts are cryptographic HMAC-SHA256 signatures that prove a tool actually executed. When enabled, every successful tool execution produces a receipt that the LLM cannot forge — because the signing key is ephemeral, per-session, and never exposed to the model.
6+
7+
This addresses a class of LLM failure where the model claims to have used a tool (or denies having used one) without any independent verification. Receipts create ground truth about what actually ran.
8+
9+
Based on: Basu, A. (2026). "Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents." [arXiv:2603.10060](https://doi.org/10.48550/arXiv.2603.10060).
10+
11+
---
12+
13+
## Configuration
14+
15+
```toml
16+
[agent.tool_receipts]
17+
enabled = true # Generate HMAC receipts for tool executions (default: false)
18+
show_in_response = true # Append receipts to user-visible messages (default: false)
19+
```
20+
21+
Both options default to `false` — no behavioral change for existing users.
22+
23+
---
24+
25+
## How it works
26+
27+
1. When the agent loop starts, an ephemeral 256-bit key is generated (never logged, never sent to the LLM).
28+
2. After each successful tool execution, the runtime computes:
29+
```
30+
receipt = HMAC-SHA256(key, tool_name | args | result | timestamp)
31+
```
32+
3. The receipt is appended to the tool result as `[receipt: zc-receipt-{timestamp}-{hash}]` before the result is returned to the LLM.
33+
4. The system prompt instructs the LLM to preserve receipts verbatim when referencing tool results.
34+
35+
### Receipt format
36+
37+
```
38+
zc-receipt-1774608496-gzpEBuUIRYX1vd4fQl4oYkqhq4-GnoJDStmlYzvQiWA
39+
^timestamp ^base64url-encoded HMAC-SHA256 digest
40+
```
41+
42+
The `zc-receipt-` prefix distinguishes real receipts from fabricated ones. The LLM cannot compute a valid HMAC because it doesn't know the session key and cannot perform the math.
43+
44+
---
45+
46+
## What receipts detect
47+
48+
| Scenario | Without receipts | With receipts |
49+
|----------|-----------------|---------------|
50+
| LLM claims it ran a tool but didn't | Undetectable | No receipt exists — fabrication detected |
51+
| LLM fabricates a tool result | Undetectable | HMAC won't match — tampering detected |
52+
| LLM denies running tools it actually ran | Unverifiable | Receipts in log prove execution |
53+
| LLM fabricates a receipt string | Plausible-looking | HMAC verification fails — forgery detected |
54+
55+
### What receipts don't prevent
56+
57+
- The LLM can still say anything in its text output — receipts don't suppress responses.
58+
- The LLM can answer questions without using tools at all. Receipts only verify tool calls that were made, not tool calls that should have been made.
59+
60+
---
61+
62+
## Viewing receipts
63+
64+
### In debug logs
65+
66+
```bash
67+
RUST_LOG=zeroclaw::agent=debug zeroclaw daemon
68+
```
69+
70+
Look for:
71+
```
72+
Tool receipt generated tool=shell receipt=zc-receipt-1774604899-fVRG...
73+
```
74+
75+
### In user-visible messages
76+
77+
When `show_in_response = true`, the bot's response includes:
78+
79+
```
80+
Here's the weather in Istanbul: 16°C, sunny.
81+
82+
---
83+
Tool receipts:
84+
weather: zc-receipt-1774608496-gzpEBuUIRYX1vd4fQl4oYkqhq4-GnoJDStmlYzvQiWA
85+
```
86+
87+
### Inline in LLM responses
88+
89+
The system prompt instructs the LLM to echo receipts when referencing tool results. These appear inline in the response. The leak detector is configured to NOT redact `zc-receipt-` tokens.
90+
91+
---
92+
93+
## Security properties
94+
95+
- **Ephemeral keys**: A new key is generated for each agent session. Keys are never persisted, logged, or sent to the LLM.
96+
- **HMAC-SHA256**: Standard cryptographic MAC. The digest binds the tool name, arguments, result, and timestamp together — changing any input invalidates the receipt.
97+
- **No new dependencies**: Uses `hmac`, `sha2`, `ring`, and `base64` — all already in the dependency tree.
98+
- **No performance impact**: Receipt generation adds <1ms per tool call (HMAC computation is negligible).
99+
100+
---
101+
102+
## Limitations (Phase 1)
103+
104+
- **Passive only**: Receipts are generated and logged but not validated against LLM responses. The system does not block responses with missing or invalid receipts.
105+
- **No persistent audit**: Receipts are in debug logs and conversation history but not stored in a queryable database.
106+
- **No cross-session verification**: Ephemeral keys mean receipts cannot be verified after the session ends.
107+
108+
These are addressed in the Phase 2 roadmap (#4830).
109+
110+
---
111+
112+
## Related docs
113+
114+
- [Audit Logging](audit-logging.md) — broader audit trail proposal
115+
- [Agnostic Security](agnostic-security.md) — security model overview
116+
- [Config Reference](../reference/api/config-reference.md) — full config options

src/security/leak_detector.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -323,8 +323,8 @@ impl LeakDetector {
323323
// intentionally appear in output. Strip them before entropy scanning so
324324
// they are not redacted as leaked credentials. See #4830.
325325
static RECEIPT_PATTERN: OnceLock<Regex> = OnceLock::new();
326-
let receipt_re = RECEIPT_PATTERN
327-
.get_or_init(|| Regex::new(r"zc-receipt-\d+-[A-Za-z0-9_-]+").unwrap());
326+
let receipt_re =
327+
RECEIPT_PATTERN.get_or_init(|| Regex::new(r"zc-receipt-\d+-[A-Za-z0-9_-]+").unwrap());
328328
let content_stripped = url_re.replace_all(content, "");
329329
let content_without_urls = media_re.replace_all(&content_stripped, "");
330330
let content_without_receipts = receipt_re.replace_all(&content_without_urls, "");

0 commit comments

Comments
 (0)