feat(agent): HMAC tool execution receipts for hallucination detection#5168
feat(agent): HMAC tool execution receipts for hallucination detection#5168singlerider merged 15 commits intozeroclaw-labs:masterfrom
Conversation
20bcc7c to
bfad4e3
Compare
7895533 to
afcd714
Compare
afcd714 to
bd22a8f
Compare
0d02153 to
48e6efd
Compare
|
I build on Linux. My tree is affected by the |
JordanTheJet
left a comment
There was a problem hiding this comment.
Agent Review β PR #5168
Disclaimer: This PR touches high-risk paths (
src/security/**,src/tools/**) and requires human maintainer approval. This review is advisory to assist the maintainer.
Comprehension Summary
What: Adds an HMAC-SHA256 receipt system for tool executions. An ephemeral per-session 256-bit key (via ring::SystemRandom) generates cryptographic receipts appended to tool results, enabling verification that tool calls actually executed vs. LLM hallucination. Touches src/agent/tool_receipts.rs (new module), src/agent/tool_execution.rs, src/agent/loop_.rs, src/channels/mod.rs, src/config/schema.rs, src/security/leak_detector.rs, src/tools/delegate.rs, and docs.
Why: Based on arXiv:2603.10060 β autonomous agents making consequential decisions need ground truth about tool execution. Fabricated results are currently indistinguishable from real ones. Closes #4830, supersedes #4943/#4831 (same author).
Blast radius: Tool execution pipeline (receipt appended to results), system prompt (instructions when enabled), leak detector (receipt exemption from entropy scan), config schema (new agent.tool_receipts section). Disabled by default β zero impact on existing deployments.
Security Assessment
Positive:
- Ephemeral key via
ring::SystemRandom(CSPRNG) β never persisted, logged, or exposed to LLM - HMAC-SHA256 is a standard, well-vetted MAC construction
- Receipt generation occurs on scrubbed output (
scrub_credentials()runs before HMAC input), preventing credential correlation through receipt verification - Leak detector correctly exempts
zc-receipt-tokens from high-entropy redaction, with a specific regex pattern that won't accidentally exempt real credentials - No new crate dependencies β
hmac,sha2,ring,base64are all already in the dependency tree #[cfg(test)]gating onwith_key()constructor prevents misuse of deterministic keys in production
[suggestion] Timing side-channel in verify() (tool_receipts.rs:87):
expected_hash == provided_hash uses standard string comparison, not constant-time comparison. In the current threat model (LLM is the attacker, no timing oracle available), this is not exploitable. However, for defense-in-depth and to prevent future misuse if verification is ever exposed at a network boundary, consider using hmac::Mac::verify_slice() which provides constant-time comparison natively. Low priority β noting for completeness.
No security weakening identified. The feature is additive and opt-in.
Performance Assessment
- Receipt generation adds <1ms per tool call (HMAC computation is negligible)
- Memory: one 32-byte key per session + ~64 chars per receipt in tool result strings
- Binary size: minimal β no new crates, small code footprint
- No performance impact identified for disabled state (all receipt paths are behind
Optionchecks)
Code Review
src/agent/tool_receipts.rs β Clean, well-structured module. 12 unit tests cover: determinism, format parsing, verification success, tampered results/names/args, wrong keys, fabricated receipts, malformed receipts, cross-tool rejection, generate_now, and random key uniqueness. Good coverage of adversarial scenarios.
[suggestion] Minor allocation in parse_receipt() (tool_receipts.rs:111):
let rest = receipt.strip_prefix(&format!("{RECEIPT_PREFIX}-"))?;This allocates a String on every call via format!(). Could use the literal "zc-receipt-" directly since RECEIPT_PREFIX is "zc-receipt":
let rest = receipt.strip_prefix("zc-receipt-")?;Trivial, but this runs on every tool result when receipts are enabled.
src/agent/tool_execution.rs β Integration is clean. Receipt generation only on successful executions. receipt_generator passed as Option β proper opt-in pattern. The ToolExecutionOutcome.receipt field is well-documented.
src/security/leak_detector.rs β Receipt exemption regex zc-receipt-\d+-[A-Za-z0-9_-]+ is correctly scoped. Test tool_receipts_not_redacted_as_high_entropy verifies the exemption. Existing leak detection behavior is preserved (the new regex strip is an additional pass before entropy scanning).
src/config/schema.rs β ToolReceiptsConfig with serde(default) on both fields. Clean defaults (both false).
src/channels/mod.rs β System prompt injection (line ~5880) is clear and correctly gated on config.agent.tool_receipts.enabled. The show_in_response footer formatting is clean. Good use of unwrap_or_else(|e| e.into_inner()) for poisoned mutex recovery.
src/tools/delegate.rs β Subagent call site passes None for receipt params with TODO comments for future threading. Appropriate for Phase 1.
[blocking] Branch hygiene β needs rebase
The branch contains merge commits (master merged into feature branch) rather than a clean rebase. The local build produces compile errors in test files that weren't fully updated for newer struct fields introduced on master since the branch was created. While CI passed at PR creation time, the branch history is messy:
7146ee16 fix: resolve merge collision compile errors from master integration
2ea4af85 merge: resolve conflicts between tool-receipts and master
be747242 fix(web): resolve merge collision errors in web dashboard TypeScript
50425247 fix: resolve merge collision errors and bump version to 0.6.6
The build currently fails with errors like:
run_tool_call_looptakes 27 arguments but 26 supplied (2 test call sites inloop_.rs~line 9551, 9635)- Missing
receipt_generatorandshow_receipts_in_responsefields in testChannelRuntimeContextinitializer (~line 9344)
Action: Please rebase cleanly onto current master and ensure all tests compile and pass. The squash-merge policy means the merge commits won't persist, but the build needs to be green.
[suggestion] Unrelated .gitignore change
The addition of apps/tauri/gen/schemas/ to .gitignore is not related to tool receipts. Per project policy ("one concern per PR"), this should ideally be a separate commit. Not blocking, but noting for PR discipline.
[question] _force_xml_tools parameter
The run_tool_call_loop function signature gained a _force_xml_tools: bool parameter (underscore-prefixed, unused in the function body). The comment references [agent] tool_dispatcher = "xml" config. Is this intended for a future PR, or was it meant to be wired up in this one? If unused, removing it would reduce the cognitive overhead of the already-27-parameter function signature.
Regression Analysis
| Area | Risk | Assessment |
|---|---|---|
| Tool execution flow | Low | Receipt generation is append-only to existing output. No change when disabled. |
| Leak detector | Low | New regex strip pass is additive. Existing patterns still match. |
| System prompt | Low | Receipt instructions only appended when feature enabled. |
| Config loading | None | New struct fields have serde(default) β backward compatible. |
| Existing tests | None | All existing test call sites updated with None/false defaults. |
Documentation
Thorough. docs/security/tool-receipts.md includes: how it works, what it detects, what it doesn't prevent, viewing instructions, security properties, and Phase 1 limitations. Config reference updated. Security README index updated.
Tests
- 12 unit tests in
tool_receipts.rscovering generation, verification, adversarial scenarios - 1 leak detector test (
tool_receipts_not_redacted_as_high_entropy) - All existing test call sites updated for new parameters
- Missing: No integration test exercising the full pipeline (receipt generation β append to tool result β leak detector passthrough β show_in_response). Understandable for Phase 1, but worth noting for Phase 2.
Verdict
Needs author action before maintainer merge:
- [blocking] Rebase onto current
masterand fix the compile errors in test call sites. Ensurecargo testpasses locally. - [suggestion] Consider constant-time HMAC verification (use
Mac::verify_slice()instead of string==). - [suggestion] Remove
_force_xml_toolsdead parameter if not used, or wire it up if intended. - [suggestion] Use string literal instead of
format!()inparse_receipt().
Thank you @singlerider for this well-designed feature. The cryptographic design is sound, the opt-in approach is correct, the documentation is thorough, and the test coverage for the core module is excellent. The main blocker is the branch needing a clean rebase against current master.
WareWolf-MoonWall
left a comment
There was a problem hiding this comment.
Review: feat(agent): HMAC tool execution receipts for hallucination detection
Verdict: changes requested
β Commendation
The cryptographic design of ReceiptGenerator is sound. Ephemeral key generation via ring::SystemRandom, HMAC-SHA256 with mac.verify_slice() for constant-time comparison, #[cfg(test)]-gated with_key() to prevent deterministic keys escaping into production β each of these decisions reflects careful, defence-in-depth thinking. The 12 unit tests cover the adversarial cases that matter most (tampered result, tampered name, tampered args, wrong key, fabricated receipt, malformed receipt, cross-tool rejection), which is exactly the kind of test set that makes a security primitive trustworthy. This module is production-quality as written.
Addressing Jordan's four inline suggestions in a single, cleanly-described commit (eeae0a1) β constant-time HMAC, removing the dead _force_xml_tools parameter, using a string literal in parse_receipt(), and stripping the unrelated .gitignore entry into its own commit β demonstrates good responsiveness to feedback and keeps the change history readable.
The leak detector exemption is correctly implemented: OnceLock for compile-once regex, a scoped prefix pattern zc-receipt-\d+-[A-Za-z0-9_-]+ that won't accidentally exempt real credentials, and a dedicated test proving the exemption works. This is the right pattern and it was done carefully.
Placing the implementation in crates/zeroclaw-runtime/src/agent/tool_receipts.rs with a thin re-export stub at src/agent/tool_receipts.rs is the correct approach under the workspace migration β the crate gets the real code, the monolith gets a one-liner. This is RFC #5574 in practice.
π΄ Blocking
1. Config activation path is entirely absent β the feature cannot be enabled
The PR documents two config keys (agent.tool_receipts.enabled, agent.tool_receipts.show_in_response) and describes their behaviour. But the diff contains no ToolReceiptsConfig struct, no addition to the config schema, and no code that reads these fields to construct a ReceiptGenerator. Every single call site in the diff passes None, None:
None, // receipt_generator
None, // collected_receipts
There is no path by which a user setting agent.tool_receipts.enabled = true produces any effect. Depending on schema validation settings, this either causes a parse error on startup or is silently ignored. Either way, the documented feature is inert.
This looks like the config schema change and the channel system-prompt injection that Jordan reviewed in the original src/-based implementation were lost during the migration to crates/zeroclaw-runtime/. The ReceiptGenerator and its wiring points arrived in the crates workspace; the config struct and the run() code that would read it and pass a live generator did not.
The principle from AGENTS.md and RFC #5653: do not ship user-visible config keys that do nothing. The fix is to port ToolReceiptsConfig to crates/zeroclaw-config/src/schema.rs and add the config-reading code in run() that creates a ReceiptGenerator when enabled = true and passes it through to run_tool_call_loop. Until that is present, the feature is documented but unreachable.
2. Branch still uses merge commits β Jordan's blocking item was not addressed
Jordan's review flagged this as blocking and asked for a clean rebase. The author's response was to add two more merge commits (de952d3, 201d633). The branch history now contains four merge commits total.
The consequence is visible in the diff: loop_.rs shows 1,411 lines of test additions (parse_tool_calls robustness, GLM-style parsing, scrub_credentials edge cases, history management, etc.) that have no connection to tool receipts. These are master-branch changes absorbed via merge upstream/master. A reviewer cannot distinguish what is the feature from what is the ambient diff noise, and the stated blast radius in the PR template understates what is actually changing.
Please rebase cleanly onto current master. Once the rebase is done, Jordan's dismissed review needs an explicit re-approval from him before merge β a dismissed review that was specifically blocking does not self-resolve.
π‘ Conditional
show_in_response display logic is missing
The config docs and tool-receipts.md both describe a footer that appears in user-visible channel responses when show_in_response = true. The collected_receipts parameter is wired through run_tool_call_loop and receipts are pushed into the store, but nothing reads the store afterwards to format and emit a response footer. The display path from collected_receipts to the channel output is absent.
This does not need to block merge on its own β the collection infrastructure is there and the config activation work (blocking item 1) will naturally surface where the display code needs to live. But please file and assign a follow-up issue before merge so it doesn't get lost. Without this, users who set show_in_response = true get no visible output and no indication that the setting did nothing.
Process note (not a code blocker β maintainer action)
The PR is labelled risk: medium. The diff touches crates/zeroclaw-runtime/src/security/leak_detector.rs and the runtime agent loop β both of which fall under the risk: high tier per AGENTS.md and reviewer playbook Β§2. Please coordinate with a maintainer to update the label before the next review pass.
Agent Triage Note β PR #5168Skipped β high-risk path. This PR modifies Current status: @JordanTheJet reviewed with 1 blocking item (rebase needed) and 3 suggestions. Author addressed all items. @WareWolf-MoonWall then reviewed with 2 blocking items: (1) config activation path is entirely absent β the feature cannot be enabled, and (2) branch still uses merge commits. One conditional: Key concern: The feature is documented but unreachable β no No further agent action taken. |
|
@WareWolf-MoonWall @JordanTheJet Addressing the config activation blocker: This PR is scoped as mechanism only β the HMAC receipt generation, verification, and injection pipeline. Config activation ( Rationale: the receipt mechanism needs to land first so the config activation PR has something to point at. Shipping both in one PR would make this Follow-up will add:
Also merged upstream/master to fix CI (broken imports from pre-workspace-split code). |
WareWolf-MoonWall
left a comment
There was a problem hiding this comment.
Review: feat(agent): HMAC tool execution receipts for hallucination detection
Verdict: changes requested
β Standing commendation
The ReceiptGenerator module itself remains production-quality. Ephemeral key
via ring::SystemRandom, constant-time mac.verify_slice(), #[cfg(test)]-
gated with_key(), adversarial test coverage across tampered result/name/args,
wrong key, fabricated receipt, cross-tool rejection β this is exactly how a
security primitive should be written and tested. The leak detector exemption is
also correct: OnceLock regex, a scoped prefix pattern that won't accidentally
exempt real credentials, and a dedicated test. Good work on Jordan's four
suggestions as well β all addressed cleanly.
π΄ Blocking
1. Author's scope-reframe on config activation requires PR updates before it
can be accepted
The latest comment proposes landing the mechanism now and deferring config
activation (ToolReceiptsConfig, schema wiring, the run() code that constructs
a live ReceiptGenerator) to a follow-up. The argument β mechanism first so the
activation PR has something to point at β is reasonable and I'm not opposed to
the split in principle.
But the PR as written claims more than it delivers, and that has to be corrected
before the split becomes acceptable:
-
The PR description "Config/env changes?" section lists
agent.tool_receipts.enabled
andagent.tool_receipts.show_in_responseas new keys added by this PR. They
are not. If a user reads the merged PR description and tries to configure those
keys, they will hit either a parse error or silent ignore. Update this section
to say "config activation deferred to follow-up" and link the follow-up issue. -
docs/security/tool-receipts.md(per Jordan's original review: "includes how
it works, what it detects, viewing instructions") presumably documents the
config keys and how to enable the feature. A user reading that doc after this
PR merges cannot actually enable anything. Either add a clear "not yet
activatable β follow-up pending" callout at the top of the doc, or defer the
doc to land with the activation PR. Docs that describe unreachable behaviour
create user-facing confusion. -
Closes #4830β if #4830 covers the full feature end-to-end, it should not
be closed by mechanism-only code. Either remove theCloseskeyword here and
add it to the follow-up, or narrow the scope of #4830 to mechanism-only and
open a new issue for activation.
Once those three corrections are made, the mechanism-only split is acceptable.
2. Merge commits and 1,400+ lines of unrelated diff β still unaddressed
Jordan flagged this as blocking in his original review. My dismissal carried the
same point. The latest author comment does not mention it. The branch currently
contains four merge commits from pulling upstream master rather than rebasing,
and the consequence is visible: the diff includes 1,411 lines of test additions
(parse_tool_calls Qwen regression tests, strip_tool_result_blocks,
extract_json_values, history management, GLM parsing, constants bounds checks,
etc.) that have no connection to tool receipts. These are master-branch changes
absorbed through the merges.
A reviewer cannot distinguish what is the receipt feature from what is ambient
noise. The stated blast radius ("tool execution pipeline, system prompt, leak
detector, config schema") understates what is actually present in the diff. And
Jordan's dismissed CHANGES_REQUESTED does not self-resolve β please re-request
his review once the rebase is done.
Please rebase cleanly onto current master. The squash-merge policy means the
merge commits won't persist in history, but the build must be green and the diff
must reflect only this feature.
π‘ Conditional
show_in_response display path is unimplemented β needs a tracked follow-up
collected_receipts is threaded through run_tool_call_loop and receipts are
pushed into the store, but nothing reads the store to emit a response footer.
The docs and PR description describe this as a user-visible feature. If the
mechanism-first split is accepted and config activation goes into a follow-up,
this naturally defers there too β but please open and assign the follow-up issue
before this merges, and link it in the PR description alongside the config
activation follow-up. Two deferred items need two tracked issues (or one issue
that explicitly covers both). Without tracking, "show_in_response = true" will
produce no visible output and no diagnostic that the setting did nothing.
Process note (not a code blocker β maintainer action)
The PR is labelled risk: medium. The diff touches
crates/zeroclaw-runtime/src/security/leak_detector.rs and the runtime agent
loop, both of which fall under risk: high per AGENTS.md. Please coordinate
with a maintainer to update the label before the next review pass.
The core mechanism is ready. Two corrections stand between this and merge: a
clean rebase so the diff shows only the receipt feature, and PR/doc updates that
accurately describe the mechanism-only scope. Looking forward to the next
revision.
β¦scope - Add append_receipt_footer() that reads collected_receipts and appends a formatted footer to the agent response when receipts were generated - Wire footer into all three return paths in run_tool_call_loop - Add 4 tests: empty receipts, None store, single receipt, multiple receipts - Update docs/security/tool-receipts.md: remove Phase 1 framing, note config activation as pending follow-up - PR description: remove config keys that don't exist yet, change Closes zeroclaw-labs#4830 to Related (already closed)
|
@WareWolf-MoonWall @JordanTheJet Addressing all review findings: Blocker 1 (scope/docs mismatch) β resolved:
Blocker 2 (rebase) β declined per project convention: Blocker 3 (duplicate file) β resolved: Conditional (show_in_response) β implemented, not deferred:
Process (risk label) β
CI: Merged upstream/master to resolve workspace-split import errors. All receipt code compiles. |
β¦space imports Tests for parse_tool_calls, parse_glm, extract_json_values, etc. already live in zeroclaw-tool-call-parser. The copies in loop_.rs were absorbed through merge commits and referenced pre-workspace-split import paths. Also fix crate::config::* and crate::providers::* imports in remaining tests to use zeroclaw_config::schema::* and zeroclaw_api::provider::*.
|
@WareWolf-MoonWall @JordanTheJet All findings addressed: π΄ Blocker 1 (scope/docs mismatch) β previously resolved: π΄ Blocker 2 (merge commits / inflated diff) β resolved (af716b0): π‘ Conditional (show_in_response) β previously resolved: Also merged upstream/master β clean, no conflicts. fmt + clippy clean, full test suite passing. |
WareWolf-MoonWall
left a comment
There was a problem hiding this comment.
Agent Review β PR #5168
Disclaimer: This PR touches
crates/zeroclaw-runtime/src/security/**andsrc/agent/**β high-risk paths perAGENTS.md. This review is advisory; human maintainer approval is required before merge.
Comprehension Summary
What: Adds an HMAC-SHA256 receipt system (tool_receipts.rs) for tool-execution hallucination detection. An ephemeral 256-bit key (via ring::SystemRandom) is generated per session. After each successful tool execution, execute_one_tool computes HMAC-SHA256(key, tool_name | args | result | timestamp) over the scrubbed output, appends [receipt: zc-receipt-{ts}-{hash}] to the tool result, and stashes the receipt in a session-scoped Mutex<Vec<String>>. append_receipt_footer drains the store and appends a ---\nTool receipts: block to the agent response when show_in_response is enabled. The leak detector is updated to strip zc-receipt- tokens before high-entropy scanning. All call sites pass None for both receipt params (mechanism is fully inert until config activation lands in a follow-up PR).
Why: Autonomous agents making consequential decisions need ground truth about tool execution. Fabricated results are currently indistinguishable from real ones (arXiv:2603.10060).
Blast radius: tool_receipts.rs (new), tool_execution.rs (receipt field + generation), loop_.rs (parameter threading + append_receipt_footer), security/leak_detector.rs (exemption regex), delegate.rs and channels/orchestrator/mod.rs (null params). Zero runtime impact on existing deployments β all paths guarded by Option.
CI
20/20 checks passing: fmt, clippy, strict delta lint, test (all features), 32-bit check, security audit, docs quality, cross-platform builds (Linux, macOS arm64, Windows). β
Security Assessment
Positives:
- Ephemeral key via
ring::SystemRandom(CSPRNG) β never persisted, logged, or sent to the LLM β mac.verify_slice()for constant-time comparison β timing side-channel closed βscrub_credentials()runs before HMAC input, preventing credential correlation through receipt verification β#[cfg(test)]-gatedwith_key()constructor β deterministic keys cannot be used in production β- Leak detector exemption regex
zc-receipt-\d+-[A-Za-z0-9_-]+is correctly scoped and tested β - No new crate dependencies β
hmac,sha2,ring,base64already in the dependency tree β
[suggestion] HMAC input uses | as a field separator without length-prefixing (tool_name.as_bytes() | b"|" | args.to_string().as_bytes() | b"|" | result.as_bytes() | b"|" | timestamp). If a tool_name or serialised args value ever contains a literal |, two different inputs could produce identical HMAC messages β for example, tool "shell|extra" / args {} produces the same concatenation as tool "shell" / args "|extra{}". Not immediately exploitable (the LLM does not know the key and cannot compute the MAC), but for defense-in-depth, consider using a length-prefixed encoding or a \x00 separator (which cannot appear in valid tool names or JSON). Low priority for Phase 1; worth addressing before active verification is exposed.
[suggestion β low priority] serde_json::Value::to_string() for HMAC input: object key ordering in serde_json maps is insertion-order-stable for the lifetime of a Value, so in-process generation and verification are consistent. However, if args are ever re-serialised (e.g., deserialized from a log then reverified), key reordering could cause spurious HMAC failures. Not an issue while verification is passive-only, but worth documenting in a TODO comment for when active verification lands.
Code Review
tool_receipts.rs β Clean, well-structured. ReceiptGenerator is Clone, stateless after construction, and threadsafe. parse_receipt correctly handles the base64url hash containing embedded - characters by taking everything after the first - following the stripped prefix. 13 unit tests: determinism, format parsing, verify success, tampered result/name/args, wrong key, fabricated receipt, malformed receipt, cross-tool rejection, generate_now, random key uniqueness. β
tool_execution.rs β Receipt generation correctly occurs only on successful executions. call_arguments.clone() is needed because Tool::execute consumes the value while we also need args for the HMAC. The receipt: None additions to all failure/error branches are complete β I counted 4 error paths, all updated. β
loop_.rs β append_receipt_footer correctly short-circuits on None store, empty store, and poisoned mutex. Three return Ok(...) paths in run_tool_call_loop all pass through append_receipt_footer. The receipt inline-append logic uses if let Some(ref receipt) = outcome.receipt && let Ok(mut v) = store.lock() β the if-let chain is correct; poisoned mutex silently drops the collection push (acceptable, the receipt is still appended inline to the tool result). β
[note] append_receipt_footer is declared pub. Since no cross-crate consumer exists (all callers are within zeroclaw-runtime), pub(crate) would be more appropriate.
security/leak_detector.rs β OnceLock regex, additive strip pass before entropy scan, dedicated test tool_receipts_not_redacted_as_high_entropy. β
tools/delegate.rs / channels/orchestrator/mod.rs β Null params correctly threaded. β
π΄ Blocking
config-reference.md still documents [agent.tool_receipts] as activatable without a caveat
@WareWolf-MoonWall's previous review explicitly asked: "Update this section to say 'config activation deferred to follow-up' and link the follow-up issue." The "Config activation pending" note was added to tool-receipts.md (under Current Limitations, fourth bullet). It was not added to config-reference.md.
As the doc currently stands, config-reference.md shows:
[agent.tool_receipts]
enabled = true
show_in_response = falsewith prose "When enabled, every successful tool execution produces a cryptographic receipt" and no indication that this does nothing yet. A user following the config reference sets enabled = true, gets a silent no-op, and has no indication why.
tool-receipts.md is linked ("See tool-receipts.md for full documentation"), but that link leads to step 4 of "How it works" describing system-prompt injection that is also not implemented in this PR (see below), before reaching the limitations section.
Ask: Add a brief warning callout directly in config-reference.md under the tool_receipts section, e.g.:
Note: Config activation is not yet wired. Setting these keys currently has no effect. Config-driven activation is tracked as a follow-up.
Optionally also move the "Config activation pending" bullet to the top of the Current limitations section in tool-receipts.md (it is currently the fourth bullet, making it easy to miss).
π‘ Conditional
"How it works" step 4 describes system-prompt injection not present in this PR
tool-receipts.md step 4: "The system prompt instructs the LLM to preserve receipts verbatim when referencing tool results." There is no system-prompt modification in this diff. The "Inline in LLM responses" section makes the same claim. If system-prompt wiring is intended for the activation follow-up, please update the limitations section to explicitly list it alongside config activation. If it is not planned for the follow-up, remove or move step 4 to a "Planned" section to avoid user confusion.
Process note (maintainer action β not a code blocker)
The PR is labelled risk: medium. I confirmed via gh api that crates/zeroclaw-runtime/src/security/leak_detector.rs exists on master with blob SHA 5eb4d46d21. The diff index 5eb4d46d21..461929a721 corroborates this β the file is being modified, not created. The author's claim that "leak_detector.rs does not exist on master β this PR creates it as a new file" is factually incorrect. This PR modifies existing files in crates/zeroclaw-runtime/src/security/ and src/agent/, both explicitly risk: high per AGENTS.md ("High risk: crates/zeroclaw-runtime/src/** (especially src/security/)"). Please update the label to risk: high before merge.
Residual scope note (not a code blocker)
620+ lines of new tests in loop_.rs cover functions unrelated to tool receipts: build_native_assistant_history with reasoning_content, glob_match, filter_tool_specs_for_turn, estimate_history_tokens, filter_by_allowed_tools, cost_tracking_*. The author correctly removed 834 lines of stale duplicate parser tests. The remaining additions appear to be new coverage for previously untested functions in loop_.rs β valuable, but per "one concern per PR" policy these belong in a dedicated test PR. Not blocking; recording for PR discipline.
Tests
- 13 unit tests in
tool_receipts.rs(generation, verification, adversarial scenarios) β - 4 unit tests for
append_receipt_footer(empty store, None store, single, multiple) β - 1 leak detector test (
tool_receipts_not_redacted_as_high_entropy) β - All existing test call sites updated with
None/Nonedefaults β - Missing (acknowledged, deferred): integration test exercising full pipeline (generate β append β leak detector passthrough β show_in_response footer)
Verdict
One blocker: config-reference.md needs the activation-pending caveat that tool-receipts.md already has. One conditional: clarify system-prompt step 4 status in the limitations section. Two maintainer actions: risk label update, and tracking the system-prompt wiring in the follow-up issue.
The cryptographic core is production-quality. Once the doc accuracy items are addressed, this is ready for maintainer merge.
Thank you @singlerider β the mechanism is exactly right. Looking forward to the activation follow-up.
β¦entation - config-reference.md: add warning callout that tool_receipts keys have no effect yet - tool-receipts.md: mark step 4 (system-prompt injection) as planned/not implemented - tool-receipts.md: mark "Inline in LLM responses" as planned/not implemented - tool-receipts.md: move config activation bullet to top of limitations section
|
@WareWolf-MoonWall All findings addressed (a3bd53d): π΄ Blocker β config-reference.md activation caveat: π‘ Conditional β system-prompt step 4: Process β risk label: fmt + clippy clean, full test suite passing. |
1 similar comment
|
@WareWolf-MoonWall All findings addressed (a3bd53d): π΄ Blocker β config-reference.md activation caveat: π‘ Conditional β system-prompt step 4: Process β risk label: fmt + clippy clean, full test suite passing. |
WareWolf-MoonWall
left a comment
There was a problem hiding this comment.
Three rounds of blocking reviews. All of them addressed. This is the approval pass β but one conditional is still outstanding and cannot be skipped.
β Commendation
The ReceiptGenerator core has been consistent across every revision of this PR and it remains production-quality: ephemeral key via ring::SystemRandom, constant-time mac.verify_slice(), #[cfg(test)]-gated with_key(), adversarial test coverage across tampered result/name/args, wrong key, fabricated receipt, cross-tool rejection. The leak detector exemption β OnceLock regex, a tightly scoped zc-receipt-\d+-[A-Za-z0-9_-]+ pattern that won't accidentally exempt real credentials, and a dedicated test β is also correctly implemented. This is how a security primitive should be written.
The append_receipt_footer implementation deserves a specific commendation. Four tests β empty store short-circuits cleanly, None store returns unchanged, single receipt formats correctly, multiple receipts all appear β are exactly the right set. The if-let chain handling a poisoned mutex by silently dropping the collection push (rather than panicking or returning an error) is also the right call: the receipt is already appended inline to the tool result, so the footer is a display-layer concern and a lock failure there should not disrupt the agent loop.
The docs are now accurate. The caveat in config-reference.md ("Config activation is not yet wired. Setting these keys currently has no effect.") is honest and placed where a user would look first. I confirmed in the local schema: AgentConfig does not carry #[serde(deny_unknown_fields)], so a user who sets [agent.tool_receipts] enabled = true gets a silent no-op rather than a parse error β the caveat describes the actual behavior correctly. tool-receipts.md marks step 4 and the inline-response section as (Planned β not yet implemented) and leads the limitations section with the config activation note. The mechanism-only scope is unambiguous.
Addressing every item across three CHANGES_REQUESTED reviews β including removing the 834 lines of stale test duplicates that were obscuring the actual diff β reflects exactly the kind of ownership RFC #5615 describes: the PR got harder before it got easier, and the author kept working it.
π‘ Conditional β config activation follow-up must be a numbered issue with an assignee before merge
The PR thread contains the text "Config-driven activation and system-prompt injection are tracked as a follow-up" in tool-receipts.md and "config activation deferred to follow-up" in the PR description. Neither of these references a specific issue.
RFC #5615 Β§5 on the Conditional bucket: "A conditional deferral without an assignee is not a deferral β it is a wish. Tracked issues with no owner tend to stay open indefinitely."
The accepted split β mechanism first, activation in a follow-up β is a reasonable scope decision. The mechanism is complete, the docs describe what does and does not yet work, and the activation follow-up is a natural next unit of work. All of that is fine. What is not fine is leaving the follow-up as an unanchored text promise. "Config activation pending" without an issue number and an owner means no one is tracking when it was supposed to land, no one is accountable for it, and a user reading the docs in six months cannot tell whether it ever shipped.
Before merge: open a follow-up issue covering (1) ToolReceiptsConfig struct in the config schema, (2) wiring in AgentConfig so ReceiptGenerator is constructed when enabled = true and passed through to run_tool_call_loop, and (3) system-prompt injection. Assign it. Link it in the PR description and in the "Current limitations" section of tool-receipts.md. That is the commitment, not the intention.
Suggestions (post-merge or in activation follow-up)
append_receipt_footer visibility. The function is declared pub. No cross-crate consumer exists β all callers are within zeroclaw-runtime. pub(crate) would be more appropriate. RFC Β§4.2: "pub is a contract." A public function is a promise to every caller who might implement against it. There is no caller outside this crate who needs this, and pub(crate) makes that explicit. Not a blocker for this PR; worth fixing in the activation follow-up.
HMAC input field separator. The HMAC input is tool_name | b"|" | args.to_string() | b"|" | result | b"|" | timestamp. A tool name or serialised args value containing a literal | would produce an ambiguous concatenation β two different inputs map to the same HMAC message. Not currently exploitable (the LLM has no key and cannot compute the MAC), but before active verification lands and especially before this is exposed at any network boundary, consider length-prefixed encoding or a \x00 separator. Low priority for Phase 1; do not defer past the activation PR.
Unrelated loop_.rs tests. 620+ lines of new tests in loop_.rs cover functions unrelated to tool receipts: build_native_assistant_history, glob_match, filter_tool_specs_for_turn, estimate_history_tokens, filter_by_allowed_tools, cost tracking. These are valuable tests for code that previously had none β but they belong in a test coverage PR, not here. The diff was eventually cleaned up from the 1,411-line peak, and these additions are additive (no regressions), so not blocking. Recording for PR discipline.
Process note β Jordan's formal review
Jordan reviewed this PR thoroughly at the original commit, raised a blocking item and three suggestions, and received complete responses from the author. He is still listed as a requested reviewer and has not given a formal APPROVED verdict. Given that this PR touches crates/zeroclaw-runtime/src/security/** and the runtime agent loop β both risk: high paths per AGENTS.md β his explicit approval before merge is worth getting, not bypassing. A COMMENTED review that was addressed is not the same as an APPROVED one from a code owner.
Verdict
The mechanism is correct. The cryptographic design is sound. The docs accurately describe what is and is not yet implemented. CI is 20/20 green. One conditional blocks merge: open and assign the config activation follow-up issue, then link it. Once that is done, this is ready for maintainer merge.
- Reformat append_receipt_footer calls to multi-line style - Fix indentation on receipt_generator/collected_receipts args in CLI path - Add missing receipt_generator None arg to execute_one_tool test call
Summary
masterBased on https://arxiv.org/abs/2603.10060.
Label Snapshot (required)
risk: highsize: Magent,security,config,docsagent: tool_execution,security: leak_detectordistinguished contributorChange Metadata
featuresecurityLinked Issue
Supersede Attribution (required when Supersedes # is used)
Validation Evidence (required)
cargo fmt --all -- --check cargo clippy --all-targets -- -D warnings cargo testcargo checkpasses; 12 unit tests including adversarial verification (tampered results, wrong keys, fabricated receipts); leak detector exemption testSecurity Impact (required)
Privacy and Data Hygiene (required)
passCompatibility / Migration
agent.tool_receipts.enabled,agent.tool_receipts.show_in_response) deferred to follow-up. Receipt generation is controlled programmatically viaReceiptGeneratorAPI.i18n Follow-Through (required when docs or user-facing wording changes)
Human Verification (required)
Side Effects / Blast Radius (required)
Rollback Plan (required)
git revert <commit>Risks and Mitigations