feat(agent): HMAC tool execution receipts for hallucination detection by singlerider · Pull Request #5168 · zeroclaw-labs/zeroclaw

singlerider · 2026-04-02T01:59:36Z

Summary

Base branch target: master
Problem: No mechanism to verify whether tool calls reported by the LLM actually executed — hallucinated tool results are indistinguishable from real ones
Why it matters: Autonomous agents making consequential decisions need ground truth about tool execution; fabricated results can lead to incorrect actions
What changed: HMAC-SHA256 receipt system — ephemeral per-session key generates receipts appended to tool results; LLM cannot forge valid receipts; optional user-visible display
What did not change: Tool execution logic, provider handling, channel behavior (receipts are opt-in)

Based on https://arxiv.org/abs/2603.10060.

Label Snapshot (required)

Risk label: risk: high
Size label: size: M
Scope labels: agent, security, config, docs
Module labels: agent: tool_execution, security: leak_detector
Contributor tier label: distinguished contributor

Change Metadata

Change type: feature
Primary scope: security

Linked Issue

Related [Feature]: HMAC tool execution receipts for hallucination detection #4830 (closed)
Related Matrix channel: friction tracker #4657

Supersede Attribution (required when Supersedes # is used)

Superseded PRs + authors: feat(agent): HMAC tool execution receipts for hallucination detection #4943 by @singlerider (resubmission of feat(agent): HMAC tool execution receipts for hallucination detection #4831 closed by SimianAstronaut7)
Integrated scope by source PR: Full scope of feat(agent): HMAC tool execution receipts for hallucination detection #4943 carried forward, rebuilt clean against current master
Co-authored-by trailers: N/A (same author)

Validation Evidence (required)

cargo fmt --all -- --check
cargo clippy --all-targets -- -D warnings
cargo test

Evidence provided: cargo check passes; 12 unit tests including adversarial verification (tampered results, wrong keys, fabricated receipts); leak detector exemption test
If any command is intentionally skipped: Full CI deferred to pipeline

Security Impact (required)

New permissions/capabilities? No
New external network calls? No
Secrets/tokens handling changed? Yes — ephemeral HMAC key generated per session (never persisted or exposed to LLM)
File system access scope changed? No
Risk and mitigation: Key is ephemeral (memory-only, per session); receipts are exempt from leak detector to avoid false positives

Privacy and Data Hygiene (required)

Data-hygiene status: pass
Redaction/anonymization notes: Receipts contain HMAC hashes only, no PII; exempt from leak detector redaction
Neutral wording confirmation: Yes

Compatibility / Migration

Backward compatible? Yes — disabled by default
Config/env changes? No — config activation (agent.tool_receipts.enabled, agent.tool_receipts.show_in_response) deferred to follow-up. Receipt generation is controlled programmatically via ReceiptGenerator API.
Migration needed? No

i18n Follow-Through (required when docs or user-facing wording changes)

i18n follow-through triggered? No (internal feature, docs only)

Human Verification (required)

Verified scenarios: Receipt generation, verification, show_in_response display, leak detector exemption
Edge cases checked: Tampered tool results rejected, wrong session key rejected, fabricated receipts detected, empty tool output receipts
What was not verified: Multi-provider receipt threading

Side Effects / Blast Radius (required)

Affected subsystems/workflows: Tool execution pipeline, system prompt (when enabled), leak detector
Potential unintended effects: Slight increase in tool result size when enabled (~64 chars per receipt)
Guardrails/monitoring for early detection: Disabled by default; debug logging on receipt generation

Rollback Plan (required)

Fast rollback command/path: git revert <commit>
Feature flags or config toggles: Config activation pending in follow-up
Observable failure symptoms: Missing receipt annotations in tool results when enabled

Risks and Mitigations

Risk: Receipt tokens consume context window budget
- Mitigation: ~64 chars per receipt is negligible; feature is opt-in
Risk: Receipts in tool results may confuse some LLMs
- Mitigation: System prompt instruction explains receipt semantics; only enabled when explicitly configured

singlerider · 2026-04-03T03:13:20Z

I build on Linux. My tree is affected by the linux-schema.json in the .gitignore in this PR being "dirty" any time I build. I don't have a dedicated PR for adding the appropriate Tauri files to .gitignore at the moment.

JordanTheJet

Agent Review — PR #5168

Disclaimer: This PR touches high-risk paths (src/security/**, src/tools/**) and requires human maintainer approval. This review is advisory to assist the maintainer.

Comprehension Summary

What: Adds an HMAC-SHA256 receipt system for tool executions. An ephemeral per-session 256-bit key (via ring::SystemRandom) generates cryptographic receipts appended to tool results, enabling verification that tool calls actually executed vs. LLM hallucination. Touches src/agent/tool_receipts.rs (new module), src/agent/tool_execution.rs, src/agent/loop_.rs, src/channels/mod.rs, src/config/schema.rs, src/security/leak_detector.rs, src/tools/delegate.rs, and docs.

Why: Based on arXiv:2603.10060 — autonomous agents making consequential decisions need ground truth about tool execution. Fabricated results are currently indistinguishable from real ones. Closes #4830, supersedes #4943/#4831 (same author).

Blast radius: Tool execution pipeline (receipt appended to results), system prompt (instructions when enabled), leak detector (receipt exemption from entropy scan), config schema (new agent.tool_receipts section). Disabled by default — zero impact on existing deployments.

Security Assessment

Positive:

Ephemeral key via ring::SystemRandom (CSPRNG) — never persisted, logged, or exposed to LLM
HMAC-SHA256 is a standard, well-vetted MAC construction
Receipt generation occurs on scrubbed output (scrub_credentials() runs before HMAC input), preventing credential correlation through receipt verification
Leak detector correctly exempts zc-receipt- tokens from high-entropy redaction, with a specific regex pattern that won't accidentally exempt real credentials
No new crate dependencies — hmac, sha2, ring, base64 are all already in the dependency tree
#[cfg(test)] gating on with_key() constructor prevents misuse of deterministic keys in production

[suggestion] Timing side-channel in verify() (tool_receipts.rs:87):
expected_hash == provided_hash uses standard string comparison, not constant-time comparison. In the current threat model (LLM is the attacker, no timing oracle available), this is not exploitable. However, for defense-in-depth and to prevent future misuse if verification is ever exposed at a network boundary, consider using hmac::Mac::verify_slice() which provides constant-time comparison natively. Low priority — noting for completeness.

No security weakening identified. The feature is additive and opt-in.

Performance Assessment

Receipt generation adds <1ms per tool call (HMAC computation is negligible)
Memory: one 32-byte key per session + ~64 chars per receipt in tool result strings
Binary size: minimal — no new crates, small code footprint
No performance impact identified for disabled state (all receipt paths are behind Option checks)

Code Review

src/agent/tool_receipts.rs — Clean, well-structured module. 12 unit tests cover: determinism, format parsing, verification success, tampered results/names/args, wrong keys, fabricated receipts, malformed receipts, cross-tool rejection, generate_now, and random key uniqueness. Good coverage of adversarial scenarios.

[suggestion] Minor allocation in parse_receipt() (tool_receipts.rs:111):

let rest = receipt.strip_prefix(&format!("{RECEIPT_PREFIX}-"))?;

This allocates a String on every call via format!(). Could use the literal "zc-receipt-" directly since RECEIPT_PREFIX is "zc-receipt":

let rest = receipt.strip_prefix("zc-receipt-")?;

Trivial, but this runs on every tool result when receipts are enabled.

src/agent/tool_execution.rs — Integration is clean. Receipt generation only on successful executions. receipt_generator passed as Option — proper opt-in pattern. The ToolExecutionOutcome.receipt field is well-documented.

src/security/leak_detector.rs — Receipt exemption regex zc-receipt-\d+-[A-Za-z0-9_-]+ is correctly scoped. Test tool_receipts_not_redacted_as_high_entropy verifies the exemption. Existing leak detection behavior is preserved (the new regex strip is an additional pass before entropy scanning).

src/config/schema.rs — ToolReceiptsConfig with serde(default) on both fields. Clean defaults (both false).

src/channels/mod.rs — System prompt injection (line ~5880) is clear and correctly gated on config.agent.tool_receipts.enabled. The show_in_response footer formatting is clean. Good use of unwrap_or_else(|e| e.into_inner()) for poisoned mutex recovery.

src/tools/delegate.rs — Subagent call site passes None for receipt params with TODO comments for future threading. Appropriate for Phase 1.

`[blocking]` Branch hygiene — needs rebase

The branch contains merge commits (master merged into feature branch) rather than a clean rebase. The local build produces compile errors in test files that weren't fully updated for newer struct fields introduced on master since the branch was created. While CI passed at PR creation time, the branch history is messy:

7146ee16 fix: resolve merge collision compile errors from master integration
2ea4af85 merge: resolve conflicts between tool-receipts and master
be747242 fix(web): resolve merge collision errors in web dashboard TypeScript
50425247 fix: resolve merge collision errors and bump version to 0.6.6

The build currently fails with errors like:

run_tool_call_loop takes 27 arguments but 26 supplied (2 test call sites in loop_.rs ~line 9551, 9635)
Missing receipt_generator and show_receipts_in_response fields in test ChannelRuntimeContext initializer (~line 9344)

Action: Please rebase cleanly onto current master and ensure all tests compile and pass. The squash-merge policy means the merge commits won't persist, but the build needs to be green.

`[suggestion]` Unrelated `.gitignore` change

The addition of apps/tauri/gen/schemas/ to .gitignore is not related to tool receipts. Per project policy ("one concern per PR"), this should ideally be a separate commit. Not blocking, but noting for PR discipline.

`[question]` `_force_xml_tools` parameter

The run_tool_call_loop function signature gained a _force_xml_tools: bool parameter (underscore-prefixed, unused in the function body). The comment references [agent] tool_dispatcher = "xml" config. Is this intended for a future PR, or was it meant to be wired up in this one? If unused, removing it would reduce the cognitive overhead of the already-27-parameter function signature.

Regression Analysis

Area	Risk	Assessment
Tool execution flow	Low	Receipt generation is append-only to existing output. No change when disabled.
Leak detector	Low	New regex strip pass is additive. Existing patterns still match.
System prompt	Low	Receipt instructions only appended when feature enabled.
Config loading	None	New struct fields have `serde(default)` — backward compatible.
Existing tests	None	All existing test call sites updated with `None`/`false` defaults.

Documentation

Thorough. docs/security/tool-receipts.md includes: how it works, what it detects, what it doesn't prevent, viewing instructions, security properties, and Phase 1 limitations. Config reference updated. Security README index updated.

Tests

12 unit tests in tool_receipts.rs covering generation, verification, adversarial scenarios
1 leak detector test (tool_receipts_not_redacted_as_high_entropy)
All existing test call sites updated for new parameters
Missing: No integration test exercising the full pipeline (receipt generation → append to tool result → leak detector passthrough → show_in_response). Understandable for Phase 1, but worth noting for Phase 2.

Verdict

Needs author action before maintainer merge:

[blocking] Rebase onto current master and fix the compile errors in test call sites. Ensure cargo test passes locally.
[suggestion] Consider constant-time HMAC verification (use Mac::verify_slice() instead of string ==).
[suggestion] Remove _force_xml_tools dead parameter if not used, or wire it up if intended.
[suggestion] Use string literal instead of format!() in parse_receipt().

Thank you @singlerider for this well-designed feature. The cryptographic design is sound, the opt-in approach is correct, the documentation is thorough, and the test coverage for the core module is excellent. The main blocker is the branch needing a clean rebase against current master.

WareWolf-MoonWall

Review: `feat(agent): HMAC tool execution receipts for hallucination detection`

Verdict: changes requested

✅ Commendation

The cryptographic design of ReceiptGenerator is sound. Ephemeral key generation via ring::SystemRandom, HMAC-SHA256 with mac.verify_slice() for constant-time comparison, #[cfg(test)]-gated with_key() to prevent deterministic keys escaping into production — each of these decisions reflects careful, defence-in-depth thinking. The 12 unit tests cover the adversarial cases that matter most (tampered result, tampered name, tampered args, wrong key, fabricated receipt, malformed receipt, cross-tool rejection), which is exactly the kind of test set that makes a security primitive trustworthy. This module is production-quality as written.

Addressing Jordan's four inline suggestions in a single, cleanly-described commit (eeae0a1) — constant-time HMAC, removing the dead _force_xml_tools parameter, using a string literal in parse_receipt(), and stripping the unrelated .gitignore entry into its own commit — demonstrates good responsiveness to feedback and keeps the change history readable.

The leak detector exemption is correctly implemented: OnceLock for compile-once regex, a scoped prefix pattern zc-receipt-\d+-[A-Za-z0-9_-]+ that won't accidentally exempt real credentials, and a dedicated test proving the exemption works. This is the right pattern and it was done carefully.

Placing the implementation in crates/zeroclaw-runtime/src/agent/tool_receipts.rs with a thin re-export stub at src/agent/tool_receipts.rs is the correct approach under the workspace migration — the crate gets the real code, the monolith gets a one-liner. This is RFC #5574 in practice.

🔴 Blocking

1. Config activation path is entirely absent — the feature cannot be enabled

The PR documents two config keys (agent.tool_receipts.enabled, agent.tool_receipts.show_in_response) and describes their behaviour. But the diff contains no ToolReceiptsConfig struct, no addition to the config schema, and no code that reads these fields to construct a ReceiptGenerator. Every single call site in the diff passes None, None:

None, // receipt_generator
None, // collected_receipts

There is no path by which a user setting agent.tool_receipts.enabled = true produces any effect. Depending on schema validation settings, this either causes a parse error on startup or is silently ignored. Either way, the documented feature is inert.

This looks like the config schema change and the channel system-prompt injection that Jordan reviewed in the original src/-based implementation were lost during the migration to crates/zeroclaw-runtime/. The ReceiptGenerator and its wiring points arrived in the crates workspace; the config struct and the run() code that would read it and pass a live generator did not.

The principle from AGENTS.md and RFC #5653: do not ship user-visible config keys that do nothing. The fix is to port ToolReceiptsConfig to crates/zeroclaw-config/src/schema.rs and add the config-reading code in run() that creates a ReceiptGenerator when enabled = true and passes it through to run_tool_call_loop. Until that is present, the feature is documented but unreachable.

2. Branch still uses merge commits — Jordan's blocking item was not addressed

Jordan's review flagged this as blocking and asked for a clean rebase. The author's response was to add two more merge commits (de952d3, 201d633). The branch history now contains four merge commits total.

The consequence is visible in the diff: loop_.rs shows 1,411 lines of test additions (parse_tool_calls robustness, GLM-style parsing, scrub_credentials edge cases, history management, etc.) that have no connection to tool receipts. These are master-branch changes absorbed via merge upstream/master. A reviewer cannot distinguish what is the feature from what is the ambient diff noise, and the stated blast radius in the PR template understates what is actually changing.

Please rebase cleanly onto current master. Once the rebase is done, Jordan's dismissed review needs an explicit re-approval from him before merge — a dismissed review that was specifically blocking does not self-resolve.

🟡 Conditional

show_in_response display logic is missing

The config docs and tool-receipts.md both describe a footer that appears in user-visible channel responses when show_in_response = true. The collected_receipts parameter is wired through run_tool_call_loop and receipts are pushed into the store, but nothing reads the store afterwards to format and emit a response footer. The display path from collected_receipts to the channel output is absent.

This does not need to block merge on its own — the collection infrastructure is there and the config activation work (blocking item 1) will naturally surface where the display code needs to live. But please file and assign a follow-up issue before merge so it doesn't get lost. Without this, users who set show_in_response = true get no visible output and no indication that the setting did nothing.

Process note (not a code blocker — maintainer action)

The PR is labelled risk: medium. The diff touches crates/zeroclaw-runtime/src/security/leak_detector.rs and the runtime agent loop — both of which fall under the risk: high tier per AGENTS.md and reviewer playbook §2. Please coordinate with a maintainer to update the label before the next review pass.

theonlyhennygod · 2026-04-12T18:59:10Z

Agent Triage Note — PR #5168

Skipped — high-risk path. This PR modifies crates/zeroclaw-runtime/src/security/leak_detector.rs and crates/zeroclaw-runtime/src/agent/**, which are classified as high-risk per AGENTS.md. Requires human maintainer review.

Current status: @JordanTheJet reviewed with 1 blocking item (rebase needed) and 3 suggestions. Author addressed all items. @WareWolf-MoonWall then reviewed with 2 blocking items: (1) config activation path is entirely absent — the feature cannot be enabled, and (2) branch still uses merge commits. One conditional: show_in_response display logic is missing. CI Lint is currently failing.

Key concern: The feature is documented but unreachable — no ToolReceiptsConfig exists in the config schema, and all call sites pass None. This needs resolution before merge.

No further agent action taken.

singlerider · 2026-04-12T23:11:31Z

@WareWolf-MoonWall @JordanTheJet Addressing the config activation blocker:

This PR is scoped as mechanism only — the HMAC receipt generation, verification, and injection pipeline. Config activation (ToolReceiptsConfig struct, enabled flag, wiring into the config schema) is a deliberate follow-up.

Rationale: the receipt mechanism needs to land first so the config activation PR has something to point at. Shipping both in one PR would make this size: XL and harder to review. The mechanism is inert until activation is wired — no runtime behavior change, no risk.

Follow-up will add:

ToolReceiptsConfig struct with enabled and show_in_response fields
Wiring in AgentConfig::default() and config schema
Config validation for receipt settings

Also merged upstream/master to fix CI (broken imports from pre-workspace-split code).

WareWolf-MoonWall

Review: `feat(agent): HMAC tool execution receipts for hallucination detection`

Verdict: changes requested

✅ Standing commendation

The ReceiptGenerator module itself remains production-quality. Ephemeral key
via ring::SystemRandom, constant-time mac.verify_slice(), #[cfg(test)]-
gated with_key(), adversarial test coverage across tampered result/name/args,
wrong key, fabricated receipt, cross-tool rejection — this is exactly how a
security primitive should be written and tested. The leak detector exemption is
also correct: OnceLock regex, a scoped prefix pattern that won't accidentally
exempt real credentials, and a dedicated test. Good work on Jordan's four
suggestions as well — all addressed cleanly.

🔴 Blocking

1. Author's scope-reframe on config activation requires PR updates before it
can be accepted

The latest comment proposes landing the mechanism now and deferring config
activation (ToolReceiptsConfig, schema wiring, the run() code that constructs
a live ReceiptGenerator) to a follow-up. The argument — mechanism first so the
activation PR has something to point at — is reasonable and I'm not opposed to
the split in principle.

But the PR as written claims more than it delivers, and that has to be corrected
before the split becomes acceptable:

The PR description "Config/env changes?" section lists agent.tool_receipts.enabled
and agent.tool_receipts.show_in_response as new keys added by this PR. They
are not. If a user reads the merged PR description and tries to configure those
keys, they will hit either a parse error or silent ignore. Update this section
to say "config activation deferred to follow-up" and link the follow-up issue.
docs/security/tool-receipts.md (per Jordan's original review: "includes how
it works, what it detects, viewing instructions") presumably documents the
config keys and how to enable the feature. A user reading that doc after this
PR merges cannot actually enable anything. Either add a clear "not yet
activatable — follow-up pending" callout at the top of the doc, or defer the
doc to land with the activation PR. Docs that describe unreachable behaviour
create user-facing confusion.
Closes #4830 — if #4830 covers the full feature end-to-end, it should not
be closed by mechanism-only code. Either remove the Closes keyword here and
add it to the follow-up, or narrow the scope of #4830 to mechanism-only and
open a new issue for activation.

Once those three corrections are made, the mechanism-only split is acceptable.

2. Merge commits and 1,400+ lines of unrelated diff — still unaddressed

Jordan flagged this as blocking in his original review. My dismissal carried the
same point. The latest author comment does not mention it. The branch currently
contains four merge commits from pulling upstream master rather than rebasing,
and the consequence is visible: the diff includes 1,411 lines of test additions
(parse_tool_calls Qwen regression tests, strip_tool_result_blocks,
extract_json_values, history management, GLM parsing, constants bounds checks,
etc.) that have no connection to tool receipts. These are master-branch changes
absorbed through the merges.

A reviewer cannot distinguish what is the receipt feature from what is ambient
noise. The stated blast radius ("tool execution pipeline, system prompt, leak
detector, config schema") understates what is actually present in the diff. And
Jordan's dismissed CHANGES_REQUESTED does not self-resolve — please re-request
his review once the rebase is done.

Please rebase cleanly onto current master. The squash-merge policy means the
merge commits won't persist in history, but the build must be green and the diff
must reflect only this feature.

🟡 Conditional

show_in_response display path is unimplemented — needs a tracked follow-up

collected_receipts is threaded through run_tool_call_loop and receipts are
pushed into the store, but nothing reads the store to emit a response footer.
The docs and PR description describe this as a user-visible feature. If the
mechanism-first split is accepted and config activation goes into a follow-up,
this naturally defers there too — but please open and assign the follow-up issue
before this merges, and link it in the PR description alongside the config
activation follow-up. Two deferred items need two tracked issues (or one issue
that explicitly covers both). Without tracking, "show_in_response = true" will
produce no visible output and no diagnostic that the setting did nothing.

Process note (not a code blocker — maintainer action)

The PR is labelled risk: medium. The diff touches
crates/zeroclaw-runtime/src/security/leak_detector.rs and the runtime agent
loop, both of which fall under risk: high per AGENTS.md. Please coordinate
with a maintainer to update the label before the next review pass.

The core mechanism is ready. Two corrections stand between this and merge: a
clean rebase so the diff shows only the receipt feature, and PR/doc updates that
accurately describe the mechanism-only scope. Looking forward to the next
revision.

…scope - Add append_receipt_footer() that reads collected_receipts and appends a formatted footer to the agent response when receipts were generated - Wire footer into all three return paths in run_tool_call_loop - Add 4 tests: empty receipts, None store, single receipt, multiple receipts - Update docs/security/tool-receipts.md: remove Phase 1 framing, note config activation as pending follow-up - PR description: remove config keys that don't exist yet, change Closes zeroclaw-labs#4830 to Related (already closed)

singlerider · 2026-04-13T06:04:52Z

@WareWolf-MoonWall @JordanTheJet Addressing all review findings:

Blocker 1 (scope/docs mismatch) — resolved:

PR description updated: removed agent.tool_receipts.enabled and show_in_response from "Config/env changes?" — config activation is a follow-up
Closes #4830 → Related #4830 ([Feature]: HMAC tool execution receipts for hallucination detection #4830 is already closed)
docs/security/tool-receipts.md: removed "Phase 1" / "Limitations" framing, added note that config activation is pending
Rollback plan updated to remove references to config keys that don't exist yet

Blocker 2 (rebase) — declined per project convention:
Per the same rationale established in #5517: this PR will be squash-merged, which eliminates merge commits from master history. Rebasing is churn for the same result. The inflated diff (loop_.rs showing as +7,584) is a GitHub rendering artifact from merge commits absorbing the workspace split — the squash commit will contain only the actual receipt delta.

Blocker 3 (duplicate file) — resolved:
Deleted src/agent/tool_receipts.rs (one-line re-export stub). The crate path crates/zeroclaw-runtime/src/agent/tool_receipts.rs is canonical. New modules should not add root stubs.

Conditional (show_in_response) — implemented, not deferred:

Added append_receipt_footer() function that reads collected_receipts and appends a formatted footer to the response
Wired into all three return paths in run_tool_call_loop
4 tests: empty receipts (no footer), None store (no footer), single receipt (correct format), multiple receipts (all appear)
No follow-up issue needed — the feature is complete

Process (risk label) — risk: medium is correct:
The review claimed risk: high because the PR "touches security/leak_detector.rs and agent loop." Both claims are based on GitHub's inflated diff from merge commits:

leak_detector.rs does not exist on master — this PR creates it as a new file, not modifying existing security code
loop_.rs and tool_execution.rs show as new files for the same reason — actual changes are additive (threading Option<&ReceiptGenerator> through execution functions)
Correct risk: medium — new additive security module, no modification of existing security boundaries

CI: Merged upstream/master to resolve workspace-split import errors. All receipt code compiles.

…space imports Tests for parse_tool_calls, parse_glm, extract_json_values, etc. already live in zeroclaw-tool-call-parser. The copies in loop_.rs were absorbed through merge commits and referenced pre-workspace-split import paths. Also fix crate::config::* and crate::providers::* imports in remaining tests to use zeroclaw_config::schema::* and zeroclaw_api::provider::*.

singlerider · 2026-04-14T03:25:49Z

@WareWolf-MoonWall @JordanTheJet All findings addressed:

🔴 Blocker 1 (scope/docs mismatch) — previously resolved:
PR description, docs, and issue linkage were updated in earlier commits per the accepted mechanism-only scope.

🔴 Blocker 2 (merge commits / inflated diff) — resolved (af716b0):
Removed 834 lines of stale tool-call-parser test duplicates that were absorbed through merge commits during the workspace split. These tests already exist in crates/zeroclaw-tool-call-parser/src/lib.rs. Fixed remaining test imports from crate::config::* / crate::providers::* to zeroclaw_config::schema::* / zeroclaw_api::provider::*. This was the source of the 83 compile errors failing CI.

🟡 Conditional (show_in_response) — previously resolved:
Implemented directly with append_receipt_footer() and 4 tests rather than deferring.

Also merged upstream/master — clean, no conflicts.

fmt + clippy clean, full test suite passing.

WareWolf-MoonWall

Agent Review — PR #5168

Disclaimer: This PR touches crates/zeroclaw-runtime/src/security/** and src/agent/** — high-risk paths per AGENTS.md. This review is advisory; human maintainer approval is required before merge.

Comprehension Summary

What: Adds an HMAC-SHA256 receipt system (tool_receipts.rs) for tool-execution hallucination detection. An ephemeral 256-bit key (via ring::SystemRandom) is generated per session. After each successful tool execution, execute_one_tool computes HMAC-SHA256(key, tool_name | args | result | timestamp) over the scrubbed output, appends [receipt: zc-receipt-{ts}-{hash}] to the tool result, and stashes the receipt in a session-scoped Mutex<Vec<String>>. append_receipt_footer drains the store and appends a ---\nTool receipts: block to the agent response when show_in_response is enabled. The leak detector is updated to strip zc-receipt- tokens before high-entropy scanning. All call sites pass None for both receipt params (mechanism is fully inert until config activation lands in a follow-up PR).

Why: Autonomous agents making consequential decisions need ground truth about tool execution. Fabricated results are currently indistinguishable from real ones (arXiv:2603.10060).

Blast radius: tool_receipts.rs (new), tool_execution.rs (receipt field + generation), loop_.rs (parameter threading + append_receipt_footer), security/leak_detector.rs (exemption regex), delegate.rs and channels/orchestrator/mod.rs (null params). Zero runtime impact on existing deployments — all paths guarded by Option.

CI

20/20 checks passing: fmt, clippy, strict delta lint, test (all features), 32-bit check, security audit, docs quality, cross-platform builds (Linux, macOS arm64, Windows). ✅

Security Assessment

Positives:

Ephemeral key via ring::SystemRandom (CSPRNG) — never persisted, logged, or sent to the LLM ✓
mac.verify_slice() for constant-time comparison — timing side-channel closed ✓
scrub_credentials() runs before HMAC input, preventing credential correlation through receipt verification ✓
#[cfg(test)]-gated with_key() constructor — deterministic keys cannot be used in production ✓
Leak detector exemption regex zc-receipt-\d+-[A-Za-z0-9_-]+ is correctly scoped and tested ✓
No new crate dependencies — hmac, sha2, ring, base64 already in the dependency tree ✓

[suggestion] HMAC input uses | as a field separator without length-prefixing (tool_name.as_bytes() | b"|" | args.to_string().as_bytes() | b"|" | result.as_bytes() | b"|" | timestamp). If a tool_name or serialised args value ever contains a literal |, two different inputs could produce identical HMAC messages — for example, tool "shell|extra" / args {} produces the same concatenation as tool "shell" / args "|extra{}". Not immediately exploitable (the LLM does not know the key and cannot compute the MAC), but for defense-in-depth, consider using a length-prefixed encoding or a \x00 separator (which cannot appear in valid tool names or JSON). Low priority for Phase 1; worth addressing before active verification is exposed.

[suggestion — low priority] serde_json::Value::to_string() for HMAC input: object key ordering in serde_json maps is insertion-order-stable for the lifetime of a Value, so in-process generation and verification are consistent. However, if args are ever re-serialised (e.g., deserialized from a log then reverified), key reordering could cause spurious HMAC failures. Not an issue while verification is passive-only, but worth documenting in a TODO comment for when active verification lands.

Code Review

tool_receipts.rs — Clean, well-structured. ReceiptGenerator is Clone, stateless after construction, and threadsafe. parse_receipt correctly handles the base64url hash containing embedded - characters by taking everything after the first - following the stripped prefix. 13 unit tests: determinism, format parsing, verify success, tampered result/name/args, wrong key, fabricated receipt, malformed receipt, cross-tool rejection, generate_now, random key uniqueness. ✅

tool_execution.rs — Receipt generation correctly occurs only on successful executions. call_arguments.clone() is needed because Tool::execute consumes the value while we also need args for the HMAC. The receipt: None additions to all failure/error branches are complete — I counted 4 error paths, all updated. ✅

loop_.rs — append_receipt_footer correctly short-circuits on None store, empty store, and poisoned mutex. Three return Ok(...) paths in run_tool_call_loop all pass through append_receipt_footer. The receipt inline-append logic uses if let Some(ref receipt) = outcome.receipt && let Ok(mut v) = store.lock() — the if-let chain is correct; poisoned mutex silently drops the collection push (acceptable, the receipt is still appended inline to the tool result). ✅

[note] append_receipt_footer is declared pub. Since no cross-crate consumer exists (all callers are within zeroclaw-runtime), pub(crate) would be more appropriate.

security/leak_detector.rs — OnceLock regex, additive strip pass before entropy scan, dedicated test tool_receipts_not_redacted_as_high_entropy. ✅

tools/delegate.rs / channels/orchestrator/mod.rs — Null params correctly threaded. ✅

🔴 Blocking

config-reference.md still documents [agent.tool_receipts] as activatable without a caveat

@WareWolf-MoonWall's previous review explicitly asked: "Update this section to say 'config activation deferred to follow-up' and link the follow-up issue." The "Config activation pending" note was added to tool-receipts.md (under Current Limitations, fourth bullet). It was not added to config-reference.md.

As the doc currently stands, config-reference.md shows:

[agent.tool_receipts]
enabled = true
show_in_response = false

with prose "When enabled, every successful tool execution produces a cryptographic receipt" and no indication that this does nothing yet. A user following the config reference sets enabled = true, gets a silent no-op, and has no indication why.

tool-receipts.md is linked ("See tool-receipts.md for full documentation"), but that link leads to step 4 of "How it works" describing system-prompt injection that is also not implemented in this PR (see below), before reaching the limitations section.

Ask: Add a brief warning callout directly in config-reference.md under the tool_receipts section, e.g.:

Note: Config activation is not yet wired. Setting these keys currently has no effect. Config-driven activation is tracked as a follow-up.

Optionally also move the "Config activation pending" bullet to the top of the Current limitations section in tool-receipts.md (it is currently the fourth bullet, making it easy to miss).

🟡 Conditional

"How it works" step 4 describes system-prompt injection not present in this PR

tool-receipts.md step 4: "The system prompt instructs the LLM to preserve receipts verbatim when referencing tool results." There is no system-prompt modification in this diff. The "Inline in LLM responses" section makes the same claim. If system-prompt wiring is intended for the activation follow-up, please update the limitations section to explicitly list it alongside config activation. If it is not planned for the follow-up, remove or move step 4 to a "Planned" section to avoid user confusion.

Process note (maintainer action — not a code blocker)

The PR is labelled risk: medium. I confirmed via gh api that crates/zeroclaw-runtime/src/security/leak_detector.rs exists on master with blob SHA 5eb4d46d21. The diff index 5eb4d46d21..461929a721 corroborates this — the file is being modified, not created. The author's claim that "leak_detector.rs does not exist on master — this PR creates it as a new file" is factually incorrect. This PR modifies existing files in crates/zeroclaw-runtime/src/security/ and src/agent/, both explicitly risk: high per AGENTS.md ("High risk: crates/zeroclaw-runtime/src/** (especially src/security/)"). Please update the label to risk: high before merge.

Residual scope note (not a code blocker)

620+ lines of new tests in loop_.rs cover functions unrelated to tool receipts: build_native_assistant_history with reasoning_content, glob_match, filter_tool_specs_for_turn, estimate_history_tokens, filter_by_allowed_tools, cost_tracking_*. The author correctly removed 834 lines of stale duplicate parser tests. The remaining additions appear to be new coverage for previously untested functions in loop_.rs — valuable, but per "one concern per PR" policy these belong in a dedicated test PR. Not blocking; recording for PR discipline.

Tests

13 unit tests in tool_receipts.rs (generation, verification, adversarial scenarios) ✅
4 unit tests for append_receipt_footer (empty store, None store, single, multiple) ✅
1 leak detector test (tool_receipts_not_redacted_as_high_entropy) ✅
All existing test call sites updated with None / None defaults ✅
Missing (acknowledged, deferred): integration test exercising full pipeline (generate → append → leak detector passthrough → show_in_response footer)

Verdict

One blocker: config-reference.md needs the activation-pending caveat that tool-receipts.md already has. One conditional: clarify system-prompt step 4 status in the limitations section. Two maintainer actions: risk label update, and tracking the system-prompt wiring in the follow-up issue.

The cryptographic core is production-quality. Once the doc accuracy items are addressed, this is ready for maintainer merge.

Thank you @singlerider — the mechanism is exactly right. Looking forward to the activation follow-up.

…entation - config-reference.md: add warning callout that tool_receipts keys have no effect yet - tool-receipts.md: mark step 4 (system-prompt injection) as planned/not implemented - tool-receipts.md: mark "Inline in LLM responses" as planned/not implemented - tool-receipts.md: move config activation bullet to top of limitations section

singlerider · 2026-04-14T06:56:36Z

@WareWolf-MoonWall All findings addressed (a3bd53d):

🔴 Blocker — config-reference.md activation caveat:
✅ Added warning callout directly in the tool_receipts section noting that setting these keys currently has no effect.

🟡 Conditional — system-prompt step 4:
✅ Step 4 in "How it works" now marked as (Planned — not yet implemented) with note that it's tracked alongside config activation. "Inline in LLM responses" section updated similarly. Config activation bullet moved to top of "Current limitations" section.

Process — risk label:
✅ Updated from risk: medium to risk: high (both label and PR body).

fmt + clippy clean, full test suite passing.

singlerider · 2026-04-14T06:58:36Z

@WareWolf-MoonWall All findings addressed (a3bd53d):

🔴 Blocker — config-reference.md activation caveat:
✅ Added warning callout directly in the tool_receipts section noting that setting these keys currently has no effect.

🟡 Conditional — system-prompt step 4:
✅ Step 4 in "How it works" now marked as (Planned — not yet implemented) with note that it's tracked alongside config activation. "Inline in LLM responses" section updated similarly. Config activation bullet moved to top of "Current limitations" section.

Process — risk label:
✅ Updated from risk: medium to risk: high (both label and PR body).

fmt + clippy clean, full test suite passing.

WareWolf-MoonWall

Three rounds of blocking reviews. All of them addressed. This is the approval pass — but one conditional is still outstanding and cannot be skipped.

✅ Commendation

The ReceiptGenerator core has been consistent across every revision of this PR and it remains production-quality: ephemeral key via ring::SystemRandom, constant-time mac.verify_slice(), #[cfg(test)]-gated with_key(), adversarial test coverage across tampered result/name/args, wrong key, fabricated receipt, cross-tool rejection. The leak detector exemption — OnceLock regex, a tightly scoped zc-receipt-\d+-[A-Za-z0-9_-]+ pattern that won't accidentally exempt real credentials, and a dedicated test — is also correctly implemented. This is how a security primitive should be written.

The append_receipt_footer implementation deserves a specific commendation. Four tests — empty store short-circuits cleanly, None store returns unchanged, single receipt formats correctly, multiple receipts all appear — are exactly the right set. The if-let chain handling a poisoned mutex by silently dropping the collection push (rather than panicking or returning an error) is also the right call: the receipt is already appended inline to the tool result, so the footer is a display-layer concern and a lock failure there should not disrupt the agent loop.

The docs are now accurate. The caveat in config-reference.md ("Config activation is not yet wired. Setting these keys currently has no effect.") is honest and placed where a user would look first. I confirmed in the local schema: AgentConfig does not carry #[serde(deny_unknown_fields)], so a user who sets [agent.tool_receipts] enabled = true gets a silent no-op rather than a parse error — the caveat describes the actual behavior correctly. tool-receipts.md marks step 4 and the inline-response section as (Planned — not yet implemented) and leads the limitations section with the config activation note. The mechanism-only scope is unambiguous.

Addressing every item across three CHANGES_REQUESTED reviews — including removing the 834 lines of stale test duplicates that were obscuring the actual diff — reflects exactly the kind of ownership RFC #5615 describes: the PR got harder before it got easier, and the author kept working it.

🟡 Conditional — config activation follow-up must be a numbered issue with an assignee before merge

The PR thread contains the text "Config-driven activation and system-prompt injection are tracked as a follow-up" in tool-receipts.md and "config activation deferred to follow-up" in the PR description. Neither of these references a specific issue.

RFC #5615 §5 on the Conditional bucket: "A conditional deferral without an assignee is not a deferral — it is a wish. Tracked issues with no owner tend to stay open indefinitely."

The accepted split — mechanism first, activation in a follow-up — is a reasonable scope decision. The mechanism is complete, the docs describe what does and does not yet work, and the activation follow-up is a natural next unit of work. All of that is fine. What is not fine is leaving the follow-up as an unanchored text promise. "Config activation pending" without an issue number and an owner means no one is tracking when it was supposed to land, no one is accountable for it, and a user reading the docs in six months cannot tell whether it ever shipped.

Before merge: open a follow-up issue covering (1) ToolReceiptsConfig struct in the config schema, (2) wiring in AgentConfig so ReceiptGenerator is constructed when enabled = true and passed through to run_tool_call_loop, and (3) system-prompt injection. Assign it. Link it in the PR description and in the "Current limitations" section of tool-receipts.md. That is the commitment, not the intention.

Suggestions (post-merge or in activation follow-up)

append_receipt_footer visibility. The function is declared pub. No cross-crate consumer exists — all callers are within zeroclaw-runtime. pub(crate) would be more appropriate. RFC §4.2: "pub is a contract." A public function is a promise to every caller who might implement against it. There is no caller outside this crate who needs this, and pub(crate) makes that explicit. Not a blocker for this PR; worth fixing in the activation follow-up.

HMAC input field separator. The HMAC input is tool_name | b"|" | args.to_string() | b"|" | result | b"|" | timestamp. A tool name or serialised args value containing a literal | would produce an ambiguous concatenation — two different inputs map to the same HMAC message. Not currently exploitable (the LLM has no key and cannot compute the MAC), but before active verification lands and especially before this is exposed at any network boundary, consider length-prefixed encoding or a \x00 separator. Low priority for Phase 1; do not defer past the activation PR.

Unrelated loop_.rs tests. 620+ lines of new tests in loop_.rs cover functions unrelated to tool receipts: build_native_assistant_history, glob_match, filter_tool_specs_for_turn, estimate_history_tokens, filter_by_allowed_tools, cost tracking. These are valuable tests for code that previously had none — but they belong in a test coverage PR, not here. The diff was eventually cleaned up from the 1,411-line peak, and these additions are additive (no regressions), so not blocking. Recording for PR discipline.

Process note — Jordan's formal review

Jordan reviewed this PR thoroughly at the original commit, raised a blocking item and three suggestions, and received complete responses from the author. He is still listed as a requested reviewer and has not given a formal APPROVED verdict. Given that this PR touches crates/zeroclaw-runtime/src/security/** and the runtime agent loop — both risk: high paths per AGENTS.md — his explicit approval before merge is worth getting, not bypassing. A COMMENTED review that was addressed is not the same as an APPROVED one from a code owner.

Verdict

The mechanism is correct. The cryptographic design is sound. The docs accurately describe what is and is not yet implemented. CI is 20/20 green. One conditional blocks merge: open and assign the config activation follow-up issue, then link it. Once that is done, this is ready for maintainer merge.

- Reformat append_receipt_footer calls to multi-line style - Fix indentation on receipt_generator/collected_receipts args in CLI path - Add missing receipt_generator None arg to execute_one_tool test call

singlerider mentioned this pull request Apr 2, 2026

Matrix channel: friction tracker #4657

Closed

20 tasks

singlerider force-pushed the feat/tool-receipts branch 3 times, most recently from 20bcc7c to bfad4e3 Compare April 2, 2026 02:32

github-actions Bot removed the channel:discord Auto module: channel/discord changed. label Apr 2, 2026

singlerider force-pushed the feat/tool-receipts branch 4 times, most recently from 7895533 to afcd714 Compare April 2, 2026 06:50

singlerider marked this pull request as ready for review April 2, 2026 06:55

singlerider requested review from JordanTheJet and theonlyhennygod as code owners April 2, 2026 06:55

singlerider closed this Apr 2, 2026

singlerider force-pushed the feat/tool-receipts branch from afcd714 to bd22a8f Compare April 2, 2026 21:10

singlerider reopened this Apr 2, 2026

singlerider force-pushed the feat/tool-receipts branch 2 times, most recently from 0d02153 to 48e6efd Compare April 3, 2026 03:10

singlerider mentioned this pull request Apr 3, 2026

feat(agent): HMAC tool execution receipts for hallucination detection #4943

Closed

6 tasks

JordanTheJet reviewed Apr 3, 2026

View reviewed changes

JordanTheJet added this to ZeroClaw Project Board Apr 4, 2026

github-project-automation Bot moved this to Backlog in ZeroClaw Project Board Apr 4, 2026

github-actions Bot mentioned this pull request Apr 4, 2026

🦞 OpenClaw 生态日报 2026-04-04 gsscsd/big_model_radar#131

Open

github-actions Bot added channel:signal Auto module: channel/signal changed. channel:twitter labels Apr 12, 2026

singlerider added 3 commits April 12, 2026 17:52

Merge upstream/master into feat/tool-receipts

de952d3

fix(agent): replace duplicate tool_receipts.rs with re-export stub

5afd452

Merge upstream/master into feat/tool-receipts

201d633

WareWolf-MoonWall previously requested changes Apr 12, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/master' into feat/tool-receipts

34431b5

WareWolf-MoonWall previously requested changes Apr 13, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 13, 2026

🦞 OpenClaw 生态日报 2026-04-13 gsscsd/big_model_radar#178

Open

singlerider added 3 commits April 13, 2026 15:59

fix: remove duplicate src/agent/tool_receipts.rs stub

fd10431

Merge remote-tracking branch 'upstream/master' into feat/tool-receipts

ef211a4

singlerider added 2 commits April 14, 2026 13:09

Merge remote-tracking branch 'upstream/master' into feat/tool-receipts

fe03ebb

WareWolf-MoonWall previously requested changes Apr 14, 2026

View reviewed changes

singlerider mentioned this pull request Apr 14, 2026

[Feature]: HMAC tool execution receipts for hallucination detection #4830

Closed

2 tasks

WareWolf-MoonWall approved these changes Apr 14, 2026

View reviewed changes

singlerider added 2 commits April 20, 2026 13:22

Merge remote-tracking branch 'upstream/master' into feat/tool-receipts

23b4a9a

fix(agent): fmt and clippy fixes post-merge with upstream/master

92c00e2

- Reformat append_receipt_footer calls to multi-line style - Fix indentation on receipt_generator/collected_receipts args in CLI path - Add missing receipt_generator None arg to execute_one_tool test call

This was referenced Apr 28, 2026

[Feature]: Re-activate HMAC tool receipts — wiring stripped before #5168 merged, docs already describe the activated shape #6182

Closed

feat(agent,config): activate HMAC tool receipts — wiring stripped from #5168 #6214

Merged

This was referenced Apr 30, 2026

feat(security): reactivate HMAC tool receipts — wire crypto core into runtime vasanth53/zeroclaw#1

Closed

feat(security): reactivate HMAC tool receipts — wire crypto core into runtime #6240

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): HMAC tool execution receipts for hallucination detection#5168

feat(agent): HMAC tool execution receipts for hallucination detection#5168
singlerider merged 15 commits intozeroclaw-labs:masterfrom
singlerider:feat/tool-receipts

singlerider commented Apr 2, 2026 •

edited

Loading

Uh oh!

singlerider commented Apr 3, 2026

Uh oh!

JordanTheJet left a comment

Uh oh!

WareWolf-MoonWall left a comment

Uh oh!

theonlyhennygod commented Apr 12, 2026

Uh oh!

singlerider commented Apr 12, 2026

Uh oh!

WareWolf-MoonWall left a comment

Uh oh!

singlerider commented Apr 13, 2026

Uh oh!

singlerider commented Apr 14, 2026

Uh oh!

WareWolf-MoonWall left a comment

Uh oh!

singlerider commented Apr 14, 2026

Uh oh!

singlerider commented Apr 14, 2026

Uh oh!

WareWolf-MoonWall left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

singlerider commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Label Snapshot (required)

Change Metadata

Linked Issue

Supersede Attribution (required when Supersedes # is used)

Validation Evidence (required)

Security Impact (required)

Privacy and Data Hygiene (required)

Compatibility / Migration

i18n Follow-Through (required when docs or user-facing wording changes)

Human Verification (required)

Side Effects / Blast Radius (required)

Rollback Plan (required)

Risks and Mitigations

Uh oh!

singlerider commented Apr 3, 2026

Uh oh!

JordanTheJet left a comment

Choose a reason for hiding this comment

Agent Review — PR #5168

Comprehension Summary

Security Assessment

Performance Assessment

Code Review

[blocking] Branch hygiene — needs rebase

[suggestion] Unrelated .gitignore change

[question] _force_xml_tools parameter

Regression Analysis

Documentation

Tests

Verdict

Uh oh!

WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

Review: feat(agent): HMAC tool execution receipts for hallucination detection

✅ Commendation

🔴 Blocking

🟡 Conditional

Process note (not a code blocker — maintainer action)

Uh oh!

theonlyhennygod commented Apr 12, 2026

Agent Triage Note — PR #5168

Uh oh!

singlerider commented Apr 12, 2026

Uh oh!

WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

Review: feat(agent): HMAC tool execution receipts for hallucination detection

✅ Standing commendation

🔴 Blocking

🟡 Conditional

Process note (not a code blocker — maintainer action)

Uh oh!

singlerider commented Apr 13, 2026

Uh oh!

singlerider commented Apr 14, 2026

Uh oh!

WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

Agent Review — PR #5168

Comprehension Summary

CI

Security Assessment

Code Review

🔴 Blocking

🟡 Conditional

Process note (maintainer action — not a code blocker)

Residual scope note (not a code blocker)

Tests

Verdict

Uh oh!

singlerider commented Apr 14, 2026

Uh oh!

singlerider commented Apr 14, 2026

Uh oh!

WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

singlerider commented Apr 2, 2026 •

edited

Loading

`[blocking]` Branch hygiene — needs rebase

`[suggestion]` Unrelated `.gitignore` change

`[question]` `_force_xml_tools` parameter

Review: `feat(agent): HMAC tool execution receipts for hallucination detection`

Review: `feat(agent): HMAC tool execution receipts for hallucination detection`