docs(security): add tool receipts documentation

singlerider · singlerider · commit 9305d1314d71 · 2026-03-28T09:21:00.000+10:00
Add docs/security/tool-receipts.md covering receipt format, config
options, security properties, what receipts detect vs don't prevent,
how to view receipts in logs and responses, and Phase 1 limitations.

Add tool_receipts section to config-reference.md with enabled and
show_in_response options. Link from security README.

Apply cargo fmt to leak_detector.rs.
diff --git a/docs/reference/api/config-reference.md b/docs/reference/api/config-reference.md
@@ -122,6 +122,21 @@ tools = ["mcp_browser_*"]
 keywords = ["browse", "navigate", "open url", "screenshot"]
 ```
 
+### `tool_receipts`
+
+HMAC-SHA256 tool execution receipts for hallucination detection. When enabled, every successful tool execution produces a cryptographic receipt that proves the tool actually ran. See [tool-receipts.md](../../security/tool-receipts.md) for full documentation.
+
+| Key | Default | Purpose |
+|---|---|---|
+| `enabled` | `false` | Generate HMAC receipts for tool executions |
+| `show_in_response` | `false` | Append receipts to user-visible channel messages |
+
+```toml
+[agent.tool_receipts]
+enabled = true
+show_in_response = false
+```
+
 ## `[pacing]`
 
 Pacing controls for slow/local LLM workloads (Ollama, llama.cpp, vLLM). All keys are optional; when absent, existing behavior is preserved.
diff --git a/docs/security/README.md b/docs/security/README.md
@@ -19,4 +19,5 @@ The following docs are explicitly proposal-oriented and may include hypothetical
 - [sandboxing.md](sandboxing.md)
 - [../ops/resource-limits.md](../ops/resource-limits.md)
 - [audit-logging.md](audit-logging.md)
+- [tool-receipts.md](tool-receipts.md)
 - [security-roadmap.md](security-roadmap.md)
diff --git a/docs/security/tool-receipts.md b/docs/security/tool-receipts.md
@@ -0,0 +1,116 @@
+# Tool Execution Receipts
+
+## Overview
+
+Tool receipts are cryptographic HMAC-SHA256 signatures that prove a tool actually executed. When enabled, every successful tool execution produces a receipt that the LLM cannot forge — because the signing key is ephemeral, per-session, and never exposed to the model.
+
+This addresses a class of LLM failure where the model claims to have used a tool (or denies having used one) without any independent verification. Receipts create ground truth about what actually ran.
+
+Based on: Basu, A. (2026). "Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents." [arXiv:2603.10060](https://doi.org/10.48550/arXiv.2603.10060).
+
+---
+
+## Configuration
+
+```toml
+[agent.tool_receipts]
+enabled = true           # Generate HMAC receipts for tool executions (default: false)
+show_in_response = true  # Append receipts to user-visible messages (default: false)
+```
+
+Both options default to `false` — no behavioral change for existing users.
+
+---
+
+## How it works
+
+1. When the agent loop starts, an ephemeral 256-bit key is generated (never logged, never sent to the LLM).
+2. After each successful tool execution, the runtime computes:
+   ```
+   receipt = HMAC-SHA256(key, tool_name | args | result | timestamp)
+   ```
+3. The receipt is appended to the tool result as `[receipt: zc-receipt-{timestamp}-{hash}]` before the result is returned to the LLM.
+4. The system prompt instructs the LLM to preserve receipts verbatim when referencing tool results.
+
+### Receipt format
+
+```
+zc-receipt-1774608496-gzpEBuUIRYX1vd4fQl4oYkqhq4-GnoJDStmlYzvQiWA
+          ^timestamp  ^base64url-encoded HMAC-SHA256 digest
+```
+
+The `zc-receipt-` prefix distinguishes real receipts from fabricated ones. The LLM cannot compute a valid HMAC because it doesn't know the session key and cannot perform the math.
+
+---
+
+## What receipts detect
+
+| Scenario | Without receipts | With receipts |
+|----------|-----------------|---------------|
+| LLM claims it ran a tool but didn't | Undetectable | No receipt exists — fabrication detected |
+| LLM fabricates a tool result | Undetectable | HMAC won't match — tampering detected |
+| LLM denies running tools it actually ran | Unverifiable | Receipts in log prove execution |
+| LLM fabricates a receipt string | Plausible-looking | HMAC verification fails — forgery detected |
+
+### What receipts don't prevent
+
+- The LLM can still say anything in its text output — receipts don't suppress responses.
+- The LLM can answer questions without using tools at all. Receipts only verify tool calls that were made, not tool calls that should have been made.
+
+---
+
+## Viewing receipts
+
+### In debug logs
+
+```bash
+RUST_LOG=zeroclaw::agent=debug zeroclaw daemon
+```
+
+Look for:
+```
+Tool receipt generated tool=shell receipt=zc-receipt-1774604899-fVRG...
+```
+
+### In user-visible messages
+
+When `show_in_response = true`, the bot's response includes:
+
+```
+Here's the weather in Istanbul: 16°C, sunny.
+
+---
+Tool receipts:
+  weather: zc-receipt-1774608496-gzpEBuUIRYX1vd4fQl4oYkqhq4-GnoJDStmlYzvQiWA
+```
+
+### Inline in LLM responses
+
+The system prompt instructs the LLM to echo receipts when referencing tool results. These appear inline in the response. The leak detector is configured to NOT redact `zc-receipt-` tokens.
+
+---
+
+## Security properties
+
+- **Ephemeral keys**: A new key is generated for each agent session. Keys are never persisted, logged, or sent to the LLM.
+- **HMAC-SHA256**: Standard cryptographic MAC. The digest binds the tool name, arguments, result, and timestamp together — changing any input invalidates the receipt.
+- **No new dependencies**: Uses `hmac`, `sha2`, `ring`, and `base64` — all already in the dependency tree.
+- **No performance impact**: Receipt generation adds <1ms per tool call (HMAC computation is negligible).
+
+---
+
+## Limitations (Phase 1)
+
+- **Passive only**: Receipts are generated and logged but not validated against LLM responses. The system does not block responses with missing or invalid receipts.
+- **No persistent audit**: Receipts are in debug logs and conversation history but not stored in a queryable database.
+- **No cross-session verification**: Ephemeral keys mean receipts cannot be verified after the session ends.
+
+These are addressed in the Phase 2 roadmap (#4830).
+
+---
+
+## Related docs
+
+- [Audit Logging](audit-logging.md) — broader audit trail proposal
+- [Agnostic Security](agnostic-security.md) — security model overview
+- [Config Reference](../reference/api/config-reference.md) — full config options
diff --git a/src/security/leak_detector.rs b/src/security/leak_detector.rs
@@ -323,8 +323,8 @@ impl LeakDetector {
         // intentionally appear in output. Strip them before entropy scanning so
         // they are not redacted as leaked credentials. See #4830.
         static RECEIPT_PATTERN: OnceLock<Regex> = OnceLock::new();
-        let receipt_re = RECEIPT_PATTERN
-            .get_or_init(|| Regex::new(r"zc-receipt-\d+-[A-Za-z0-9_-]+").unwrap());
+        let receipt_re =
+            RECEIPT_PATTERN.get_or_init(|| Regex::new(r"zc-receipt-\d+-[A-Za-z0-9_-]+").unwrap());
         let content_stripped = url_re.replace_all(content, "");
         let content_without_urls = media_re.replace_all(&content_stripped, "");
         let content_without_receipts = receipt_re.replace_all(&content_without_urls, "");