Skip to content
Closed
15 changes: 15 additions & 0 deletions docs/reference/api/config-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,21 @@ tools = ["mcp_browser_*"]
keywords = ["browse", "navigate", "open url", "screenshot"]
```

### `tool_receipts`

HMAC-SHA256 tool execution receipts for hallucination detection. When enabled, every successful tool execution produces a cryptographic receipt that proves the tool actually ran. See [tool-receipts.md](../../security/tool-receipts.md) for full documentation.

| Key | Default | Purpose |
|---|---|---|
| `enabled` | `false` | Generate HMAC receipts for tool executions |
| `show_in_response` | `false` | Append receipts to user-visible channel messages |

```toml
[agent.tool_receipts]
enabled = true
show_in_response = false
```

## `[pacing]`

Pacing controls for slow/local LLM workloads (Ollama, llama.cpp, vLLM). All keys are optional; when absent, existing behavior is preserved.
Expand Down
1 change: 1 addition & 0 deletions docs/security/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,5 @@ The following docs are explicitly proposal-oriented and may include hypothetical
- [sandboxing.md](sandboxing.md)
- [../ops/resource-limits.md](../ops/resource-limits.md)
- [audit-logging.md](audit-logging.md)
- [tool-receipts.md](tool-receipts.md)
- [security-roadmap.md](security-roadmap.md)
116 changes: 116 additions & 0 deletions docs/security/tool-receipts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Tool Execution Receipts

## Overview

Tool receipts are cryptographic HMAC-SHA256 signatures that prove a tool actually executed. When enabled, every successful tool execution produces a receipt that the LLM cannot forge — because the signing key is ephemeral, per-session, and never exposed to the model.

This addresses a class of LLM failure where the model claims to have used a tool (or denies having used one) without any independent verification. Receipts create ground truth about what actually ran.

Based on: Basu, A. (2026). "Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents." [arXiv:2603.10060](https://doi.org/10.48550/arXiv.2603.10060).

---

## Configuration

```toml
[agent.tool_receipts]
enabled = true # Generate HMAC receipts for tool executions (default: false)
show_in_response = true # Append receipts to user-visible messages (default: false)
```

Both options default to `false` — no behavioral change for existing users.

---

## How it works

1. When the agent loop starts, an ephemeral 256-bit key is generated (never logged, never sent to the LLM).
2. After each successful tool execution, the runtime computes:
```
receipt = HMAC-SHA256(key, tool_name | args | result | timestamp)
```
3. The receipt is appended to the tool result as `[receipt: zc-receipt-{timestamp}-{hash}]` before the result is returned to the LLM.
4. The system prompt instructs the LLM to preserve receipts verbatim when referencing tool results.

### Receipt format

```
zc-receipt-1774608496-gzpEBuUIRYX1vd4fQl4oYkqhq4-GnoJDStmlYzvQiWA
^timestamp ^base64url-encoded HMAC-SHA256 digest
```

The `zc-receipt-` prefix distinguishes real receipts from fabricated ones. The LLM cannot compute a valid HMAC because it doesn't know the session key and cannot perform the math.

---

## What receipts detect

| Scenario | Without receipts | With receipts |
|----------|-----------------|---------------|
| LLM claims it ran a tool but didn't | Undetectable | No receipt exists — fabrication detected |
| LLM fabricates a tool result | Undetectable | HMAC won't match — tampering detected |
| LLM denies running tools it actually ran | Unverifiable | Receipts in log prove execution |
| LLM fabricates a receipt string | Plausible-looking | HMAC verification fails — forgery detected |

### What receipts don't prevent

- The LLM can still say anything in its text output — receipts don't suppress responses.
- The LLM can answer questions without using tools at all. Receipts only verify tool calls that were made, not tool calls that should have been made.

---

## Viewing receipts

### In debug logs

```bash
RUST_LOG=zeroclaw::agent=debug zeroclaw daemon
```

Look for:
```
Tool receipt generated tool=shell receipt=zc-receipt-1774604899-fVRG...
```

### In user-visible messages

When `show_in_response = true`, the bot's response includes:

```
Here's the weather in Istanbul: 16°C, sunny.

---
Tool receipts:
weather: zc-receipt-1774608496-gzpEBuUIRYX1vd4fQl4oYkqhq4-GnoJDStmlYzvQiWA
```

### Inline in LLM responses

The system prompt instructs the LLM to echo receipts when referencing tool results. These appear inline in the response. The leak detector is configured to NOT redact `zc-receipt-` tokens.

---

## Security properties

- **Ephemeral keys**: A new key is generated for each agent session. Keys are never persisted, logged, or sent to the LLM.
- **HMAC-SHA256**: Standard cryptographic MAC. The digest binds the tool name, arguments, result, and timestamp together — changing any input invalidates the receipt.
- **No new dependencies**: Uses `hmac`, `sha2`, `ring`, and `base64` — all already in the dependency tree.
- **No performance impact**: Receipt generation adds <1ms per tool call (HMAC computation is negligible).

---

## Limitations (Phase 1)

- **Passive only**: Receipts are generated and logged but not validated against LLM responses. The system does not block responses with missing or invalid receipts.
- **No persistent audit**: Receipts are in debug logs and conversation history but not stored in a queryable database.
- **No cross-session verification**: Ephemeral keys mean receipts cannot be verified after the session ends.

These are addressed in the Phase 2 roadmap (#4830).

---

## Related docs

- [Audit Logging](audit-logging.md) — broader audit trail proposal
- [Agnostic Security](agnostic-security.md) — security model overview
- [Config Reference](../reference/api/config-reference.md) — full config options
Loading
Loading