feat: add review-agent-governance plugin (second inhabitant of governance category)#495
Conversation
wshobson
left a comment
There was a problem hiding this comment.
Thanks for the contribution @tomjwxf — the problem framing (hallucinated review-bot actions) is real and the Cedar coverage of the review surface is thorough. A few things to address before this can land:
1. marketplace.json unicode regression (blocking). The diff rewrites literal UTF-8 characters (—, →, á) as ASCII escapes (\u2014, \u2192, \u00e1) on other, unrelated plugin entries (HADS, conductor, qa-orchestra, reverse-engineering, protect-mcp). The current file uses literal UTF-8 — please only append the new review-agent-governance entry and leave the surrounding descriptions untouched. This is almost certainly from a json.dump(...) round-trip; ensure_ascii=False fixes it, or just append the entry by hand.
2. approve-review argument handling (likely bug). commands/approve-review.md uses REASON="${1:-}", but slash-command bodies in this repo use $ARGUMENTS (see e.g. plugins/codebase-cleanup/commands/deps-audit.md after #490). With $1, /approve-review "long reason with spaces" captures only the first word.
3. Approval records are unsigned. The README at line 94 claims "The chain records exactly which actions were human-gated and when," but the approval entries under ./review-receipts/approvals/*.json are plain JSON — they don't flow through protect-mcp sign, so @veritasacta/verify won't cover them. Either sign the approval records too, or explicitly clarify in the README that the signed chain only covers tool-call receipts and the approval log is trust-in-operator.
Non-blocking:
- Document that when the approval flag is present, PreToolUse short-circuits and no Cedar/
policy_digestfield is attached to the resulting PostToolUse receipt. Auditors should expect that. $REASONis embedded in a JSON body without escaping; a reason containing"or\will break the JSON. Low severity.- Keywords-array reformatting for
protect-mcp(single-line → multi-line) is cosmetic churn — please revert.
Also note that #496 touches the same marketplace.json and will conflict. Whichever you want to land first is fine, but they'll need to be rebased.
Happy to re-review once the blockers are addressed.
Three blocking items + one non-blocking clarification from the review: 1. marketplace.json unicode regression FIXED. Reset the file to upstream HEAD, then inserted ONLY the review-agent-governance entry with a string-based append that preserves every existing UTF-8 character byte-for-byte. No entries other than the new one are modified. `grep -c '\\u' marketplace.json` returns 0 escape sequences. 2. approve-review.md $1 → $ARGUMENTS. The marketplace slash-command convention (per plugins/codebase-cleanup/commands/deps-audit.md from PR wshobson#490) is $ARGUMENTS, which captures the full argument including spaces. $1 only captured the first word. Also JSON-escape the reason before embedding in the approval record (via python3 json.dumps) so quotes, backslashes, and newlines do not break the JSON body. This resolves the non-blocking JSON-escape note too. 3. README honesty on the approval log. Previously claimed the chain "records exactly which actions were human-gated and when," which was overstating: approval log entries under ./review-receipts/approvals/ are plain JSON, not signed. Rewrote that paragraph to explicitly separate the signed PostToolUse chain (covered by @veritasacta/verify) from the operator-trust approval log. Points users at protect-mcp sign directly if they need signed approval records for regulated environments. 4. Added an explicit note on what the signed chain covers when the approval flag is present: PreToolUse short-circuits without calling Cedar, so the downstream PostToolUse receipt has decision:allow but no policy_digest. Auditors walking the chain should expect this. Resolves the "document the short-circuit" non-blocking item. Not addressed (pending Seth's follow-up): - marketplace.json conflict with wshobson#496 will be resolved by rebase order (whichever merges first; the other rebases) Tests: - python3 -m json.tool validates marketplace.json, plugin.json, hooks.json - grep -c '\\u' on marketplace.json = 0
|
Thanks for the close read @wshobson — all three blockers land, and the non-blocking points were right too. Pushed 91c1b20 addressing them: 1. marketplace.json unicode regression — fixed. You were exactly right on the root cause ( 2. 3. Approval records are unsigned — clarified in README. You were right that the previous wording overstated the cryptographic coverage. Rewrote the relevant paragraph to explicitly say:
This is the honest framing. I considered signing approval records as part of this fix but that changes the semantic model of what the approval command does (right now it is a session marker; signing it would make it a receipt in its own right), and that feels like a v0.2 design question rather than a v0.1 fix. 4. Approval-short-circuit documentation — added. New paragraph in the README explicitly notes that when the approval flag is present, On the #496 conflict: yes, both PRs touch the same On version pinning (raised on #494): good suggestion; I will open a small follow-up PR that pins Re-requesting review. |
1. marketplace.json unicode regression FIXED. Same fix as wshobson#495: reset the file to upstream HEAD and did a string-based append of only the new signed-audit-trails entry. No other entries are touched. Verified with grep on \\u escape sequences returning 0. 2. refs.arewm.com/agent-commit/v0.2 link verified. curl -sIL returns HTTP 200; the page is live and correct. Non-blocking thought (standalone plugin vs skill under protect-mcp) acknowledged; keeping as standalone for v0.1 since the user-evaluation path ('read before commit to infrastructure') works better without requiring protect-mcp install first. Happy to restructure if the maintainer prefers the nested-skill shape. Strict-form --fail-on-missing-policy true production example deferred to a follow-up SKILL revision. Not in this fix since it would be new content rather than a bug fix.
Closes the version-pinning suggestion from @wshobson on wshobson#494. The tamper-detection test in plugins/protect-mcp/test/run-tests.sh previously called `npx protect-mcp@latest` and `npx @veritasacta/verify` with no version constraint, meaning an upstream npm publish could flip the test green or red without any repo-side signal. Pinning eliminates that. Changes ─────── - plugins/protect-mcp/hooks/hooks.json: 2 x protect-mcp@latest -> protect-mcp@0.5.5 (PreToolUse evaluate + PostToolUse sign) - plugins/protect-mcp/test/run-tests.sh: 6 x protect-mcp@latest -> protect-mcp@0.5.5 3 x @veritasacta/verify -> @veritasacta/verify@0.3.0 (the four PreToolUse test invocations, keygen, sign, plus the two verify calls in tests 7 and 8) - Header comment at the top of run-tests.sh now mentions the pinned @veritasacta/verify version for clarity. What is NOT pinned ────────────────── README.md and SKILL.md references remain as `npx protect-mcp@latest` and `npx @veritasacta/verify`. Those are documentation of the pattern a user should use in their own project, and "latest" is the right advice for that audience. The test infrastructure is the only executed path where pinning matters for reproducibility. How to bump ─────────── When you want to update (e.g., protect-mcp publishes 0.6.0 with a breaking change to --input handling), update both files together: perl -i -pe 's/protect-mcp\@0\.5\.5/protect-mcp\@0.6.0/g' \ plugins/protect-mcp/hooks/hooks.json \ plugins/protect-mcp/test/run-tests.sh Then re-run ./plugins/protect-mcp/test/run-tests.sh to confirm the tamper-detection guard still passes before merging. This PR does NOT touch review-agent-governance or signed-audit-trails because those plugins are still in review (wshobson#495, wshobson#496). Once they land, a follow-up PR will pin them too. Tests ───── - python3 -m json.tool hooks.json passes - bash -n run-tests.sh passes - No other files touched; no marketplace.json changes (avoids conflict with wshobson#495 and wshobson#496)
|
Thanks for this — the plugin is in good shape overall. Before merging, two items to address: 1. Pin Both hook commands use 2. Rebase on main (blocking) Now that #496 merged, this PR conflicts at the Non-blocking observation The Happy to merge once the pin lands and the rebase is clean. |
91c1b20 to
9da068d
Compare
Three blocking items + one non-blocking clarification from the review: 1. marketplace.json unicode regression FIXED. Reset the file to upstream HEAD, then inserted ONLY the review-agent-governance entry with a string-based append that preserves every existing UTF-8 character byte-for-byte. No entries other than the new one are modified. `grep -c '\\u' marketplace.json` returns 0 escape sequences. 2. approve-review.md $1 → $ARGUMENTS. The marketplace slash-command convention (per plugins/codebase-cleanup/commands/deps-audit.md from PR wshobson#490) is $ARGUMENTS, which captures the full argument including spaces. $1 only captured the first word. Also JSON-escape the reason before embedding in the approval record (via python3 json.dumps) so quotes, backslashes, and newlines do not break the JSON body. This resolves the non-blocking JSON-escape note too. 3. README honesty on the approval log. Previously claimed the chain "records exactly which actions were human-gated and when," which was overstating: approval log entries under ./review-receipts/approvals/ are plain JSON, not signed. Rewrote that paragraph to explicitly separate the signed PostToolUse chain (covered by @veritasacta/verify) from the operator-trust approval log. Points users at protect-mcp sign directly if they need signed approval records for regulated environments. 4. Added an explicit note on what the signed chain covers when the approval flag is present: PreToolUse short-circuits without calling Cedar, so the downstream PostToolUse receipt has decision:allow but no policy_digest. Auditors walking the chain should expect this. Resolves the "document the short-circuit" non-blocking item. Not addressed (pending Seth's follow-up): - marketplace.json conflict with wshobson#496 will be resolved by rebase order (whichever merges first; the other rebases) Tests: - python3 -m json.tool validates marketplace.json, plugin.json, hooks.json - grep -c '\\u' on marketplace.json = 0
|
@wshobson no worries, both blockers addressed in 9da068d (force-pushed).
The README and SKILL.md references to @veritasacta/verify stay unversioned where they document the CLI as a general pattern ("run npx @veritasacta/verify against your receipts"), since those are documentation-pattern rather than runtime invocations. If you'd prefer them pinned too, happy to follow up. On the non-blocking observation about governance category, agreed 100%. If future contributors add governance plugins that choose something other than the protect-mcp / Cedar / Ed25519 stack, the category description at least does not prescribe a specific stack. I canrevise the marketplace.json category description to make that welcomeness explicit if a non-ScopeBlind governance submission lands. Follow-up PR if useful - Re-requesting review. Thanks for feedback! |
9da068d to
75aca45
Compare
Three blocking items + one non-blocking clarification from the review: 1. marketplace.json unicode regression FIXED. Reset the file to upstream HEAD, then inserted ONLY the review-agent-governance entry with a string-based append that preserves every existing UTF-8 character byte-for-byte. No entries other than the new one are modified. `grep -c '\\u' marketplace.json` returns 0 escape sequences. 2. approve-review.md $1 → $ARGUMENTS. The marketplace slash-command convention (per plugins/codebase-cleanup/commands/deps-audit.md from PR wshobson#490) is $ARGUMENTS, which captures the full argument including spaces. $1 only captured the first word. Also JSON-escape the reason before embedding in the approval record (via python3 json.dumps) so quotes, backslashes, and newlines do not break the JSON body. This resolves the non-blocking JSON-escape note too. 3. README honesty on the approval log. Previously claimed the chain "records exactly which actions were human-gated and when," which was overstating: approval log entries under ./review-receipts/approvals/ are plain JSON, not signed. Rewrote that paragraph to explicitly separate the signed PostToolUse chain (covered by @veritasacta/verify) from the operator-trust approval log. Points users at protect-mcp sign directly if they need signed approval records for regulated environments. 4. Added an explicit note on what the signed chain covers when the approval flag is present: PreToolUse short-circuits without calling Cedar, so the downstream PostToolUse receipt has decision:allow but no policy_digest. Auditors walking the chain should expect this. Resolves the "document the short-circuit" non-blocking item. Not addressed (pending Seth's follow-up): - marketplace.json conflict with wshobson#496 will be resolved by rebase order (whichever merges first; the other rebases) Tests: - python3 -m json.tool validates marketplace.json, plugin.json, hooks.json - grep -c '\\u' on marketplace.json = 0
|
Rebased onto current main, marketplace.json conflict with #496 resolved. Kept the signed-audit-trails entry and added review-agent-governance after it with single-line keywords format to match repo convention. Tweaked the description to "Joins protect-mcp and signed-audit-trails in the governance category" rather than the original "Second inhabitant" since #496 landed first. Ready for re-review when you have a moment. |
wshobson
left a comment
There was a problem hiding this comment.
Thanks for the follow-ups — the three blockers from my earlier review are all addressed (marketplace.json unicode, $ARGUMENTS handling, and the README clarification on unsigned approval records). Nicely done.
One new blocker that surfaced from #503: hooks/hooks.json here uses "hook": {...} (singular) inside each matcher, but current Claude Code expects "hooks": [...] (array). It's the same schema bug #503 fixes in protect-mcp, and the plugin will fail to load on current CC until it's converted. Easiest path: wait for #503 to land and rebase, or just preemptively switch both PreToolUse and PostToolUse blocks to the array form here.
Minor: the PR body says "9 rules; 4 forbid + 1 permit" but policies/review-agent-governance.cedar actually has 5 forbid + 1 permit = 6. The README's verbal coverage is correct, only the numeric description is off.
Happy to land this once the schema is fixed.
…lass)
Second inhabitant of the `governance` category. Addresses the failure mode
where an AI agent posts PR reviews, comments, merges, or edits CI config
without a human gate. Directly inspired by the Hermes-style incident
pattern where review-bot hallucinations produce account-linked damage.
Components
──────────
- plugin.json, README.md, skills/review-agent-setup/SKILL.md
- agents/review-policy-author.md (sonnet) — Cedar author specialized in
review-surface gating across GitHub / GitLab / protected branches /
CI paths / notification surfaces
- commands/approve-review.md — opens an approval window via ./.review-approved
flag file, records the reason in ./review-receipts/approvals/
- commands/list-pending.md — walks the receipt chain to show recent denials
(the set of actions the agent tried that were blocked)
- hooks/hooks.json — PreToolUse gate + PostToolUse sign
- policies/review-agent-governance.cedar — default Cedar policy with five
forbid rules covering gh/glab review actions, protected-branch pushes,
CI config paths, and WebFetch POSTs to hooks.slack.com / api.github.com
Behavior
────────
By default, the plugin forbids:
- gh pr review|comment|merge|close|edit, gh issue comment|close|edit,
gh release create|edit, gh api repos
- GitLab and Bitbucket equivalents
- git push to main|master|release|production
- Writes to .github/workflows/, .gitlab-ci.yml, .circleci/config.yml,
.github/CODEOWNERS
- WebFetch POSTs to api.github.com / api.gitlab.com / hooks.slack.com /
discord.com
Non-review actions pass through unchanged. Composes with protect-mcp for
general policy enforcement; configure separate receipt directories to
keep the chains distinct.
Approval pattern
────────────────
Human opens an approval window by creating ./.review-approved (or via
/approve-review "<reason>"). The PreToolUse hook short-circuits to
permit while the flag is present. Every action, approved or denied,
still produces an Ed25519 receipt, so the chain records exactly what
happened and under what approval.
Marketplace entry
─────────────────
Added under category: "governance" with seven discovery keywords. The
governance category now has two inhabitants (protect-mcp + this one),
which turns it from a vanity category into a real shelf.
Standards
─────────
- Ed25519 (RFC 8032), JCS (RFC 8785), Cedar (AWS)
- IETF draft-farley-acta-signed-receipts
- Uses protect-mcp as its evaluation/signing runtime
Three blocking items + one non-blocking clarification from the review: 1. marketplace.json unicode regression FIXED. Reset the file to upstream HEAD, then inserted ONLY the review-agent-governance entry with a string-based append that preserves every existing UTF-8 character byte-for-byte. No entries other than the new one are modified. `grep -c '\\u' marketplace.json` returns 0 escape sequences. 2. approve-review.md $1 → $ARGUMENTS. The marketplace slash-command convention (per plugins/codebase-cleanup/commands/deps-audit.md from PR wshobson#490) is $ARGUMENTS, which captures the full argument including spaces. $1 only captured the first word. Also JSON-escape the reason before embedding in the approval record (via python3 json.dumps) so quotes, backslashes, and newlines do not break the JSON body. This resolves the non-blocking JSON-escape note too. 3. README honesty on the approval log. Previously claimed the chain "records exactly which actions were human-gated and when," which was overstating: approval log entries under ./review-receipts/approvals/ are plain JSON, not signed. Rewrote that paragraph to explicitly separate the signed PostToolUse chain (covered by @veritasacta/verify) from the operator-trust approval log. Points users at protect-mcp sign directly if they need signed approval records for regulated environments. 4. Added an explicit note on what the signed chain covers when the approval flag is present: PreToolUse short-circuits without calling Cedar, so the downstream PostToolUse receipt has decision:allow but no policy_digest. Auditors walking the chain should expect this. Resolves the "document the short-circuit" non-blocking item. Not addressed (pending Seth's follow-up): - marketplace.json conflict with wshobson#496 will be resolved by rebase order (whichever merges first; the other rebases) Tests: - python3 -m json.tool validates marketplace.json, plugin.json, hooks.json - grep -c '\\u' on marketplace.json = 0
Resolves wshobson's blocker: npx protect-mcp@latest reintroduces the reproducibility gap closed by wshobson#497 for the protect-mcp plugin. Pins both PreToolUse and PostToolUse hook invocations to 0.5.5.
Migrates plugins/review-agent-governance/hooks/hooks.json from the
deprecated singular "hook": {...} shape to the current
"hooks": [{...}] array shape, matching the schema fix that landed
in wshobson#503 for plugins/protect-mcp/hooks/hooks.json.
Same conversion applied to both PreToolUse and PostToolUse blocks:
Before: { "matcher": ".*", "hook": {"type": "command", ...} }
After: { "matcher": ".*", "hooks": [{"type": "command", ...}] }
Verified the resulting structure matches the post-wshobson#503 shape used by
plugins/protect-mcp/hooks/hooks.json byte-for-byte at the schema level.
Without this fix, current Claude Code reports
"path: hooks.PreToolUse.0.hooks - expected array, received undefined"
on plugin load.
75aca45 to
02e77c5
Compare
|
Thanks @wshobson, both items addressed in 02e77c5 (force-pushed). 1. hooks.json schema fix. Converted both 2. PR body rule-count correction. You're right, the policy file has Also rebased onto current main as part of this push, so #503, #496, and CI is re-running on the force-push; will leave the change set as-is for Follow-ups (no action needed on this PR; flagging in case any of them
Happy to land any of the three above as separate PRs after this one |
Update the "Composing with protect-mcp" example to use the current hooks array schema and pin protect-mcp@0.5.5, matching the runtime hooks.json. Prevents users from copy-pasting a config that current Claude Code refuses to load.
wshobson
left a comment
There was a problem hiding this comment.
Re-reviewed after the rebase + schema conversion — all prior blockers resolved. CI is green, hooks.json matches the current array schema, protect-mcp@0.5.5 is pinned, marketplace.json appends cleanly.
One small doc drift I went ahead and fixed for you in 1ca21d8 (pushed to your branch via maintainer edits): the "Composing with protect-mcp" example block in skills/review-agent-setup/SKILL.md was still using the deprecated "hook": {...} singular schema and npx protect-mcp@latest (unpinned). A user copy-pasting that into their own hooks.json would have hit exactly the schema-load failure #503 fixed for the runtime side. Now matches the array form and the @0.5.5 pin.
Approving conditional on the eval/JSON checks passing on the new commit. Will merge once they're green.
The PR-body rule-count nit (5 forbid + 1 permit, not 9) is fine to clean up post-merge as you suggested.
Hi @wshobson, follow-up contribution to the
governancecategory youcreated when merging #484. This one targets a specific class of failure:
AI agents posting review-surface actions (PR reviews, comments, merges,
CI edits) without a human gate.
Context for why this plugin
Review bots posting hallucinated review comments under real accounts is
a pattern common enough to name. The damage is immediate and visible:
reviews show up under the account, merges happen that should not, CI
workflow files get rewritten. Most of these incidents would have been
prevented by a simple pattern: require a human-signal file to exist
before the agent is allowed to take review-surface actions.
This plugin implements that pattern with Cedar + Ed25519 receipts, so
the audit trail is cryptographic and the policy is declarative.
What's in the PR
Plus one entry added to
.claude-plugin/marketplace.jsonundercategory: "governance"(the category you created in #484). This turnsthe governance category from a vanity shelf into a real one with two
complementary plugins.
What it forbids by default
The Cedar policy forbids (unless
./.review-approvedis present):gh pr review|comment|merge|close|edit,gh issue comment|close|edit,gh release create|edit,gh api reposglab mr ...)git pushtomain,master,release,production.github/workflows/,.github/CODEOWNERS,.gitlab-ci.yml,.circleci/config.ymlWebFetchPOSTs toapi.github.com,api.gitlab.com,hooks.slack.com,discord.comNon-review actions pass through unchanged. Installs cleanly alongside
protect-mcp(different receipt directory:./review-receipts/vs./receipts/).The approval flow
Flag file:
touch ./.review-approved # agent makes the approved action rm ./.review-approvedSlash command:
Creates the flag with the reason embedded, logs an approval receipt.
List denials:
Walks the receipt chain and shows recent
decision: denyentries sothe human can see what the agent tried that was blocked.
Every action (approved or denied) produces an Ed25519-signed receipt.
The chain is verifiable offline via
npx @veritasacta/verify ./review-receipts/*.json.Why this is the right second inhabitant for the category
Different class than protect-mcp.
protect-mcpis general-purpose:any tool call, any policy. This plugin is focused: review-surface
actions, with a known-good default policy and an explicit human-gate
pattern. Users who need both install both; users who only need
review gating install just this.
Solves a common real pain. The review-bot failure class is
something many teams have felt. A plugin that makes the fix one-line
is genuinely useful.
Cements the
governanceshelf. One plugin in a category reads asa vanity placement. Two that solve different problems reads as a
real category.
Verification
python3 -m json.toolvalidatesplugin.json,hooks.json, andthe updated
marketplace.jsonprotect-mcp(categorygovernance, keywords array, URL author)Notes
protect-mcpnpm package as PR feat: add protect-mcp plugin — Cedar policies + signed receipts (closes #471) #484,so users who installed that one already have the runtime. No new
dependencies.
./review-receipts/rather than./receipts/to avoid collision with protect-mcp when both areinstalled. Users can override via
REVIEW_GOVERNANCE_RECEIPTS.future v0.2 could add scoped approvals (e.g., "approved for next N
review actions only"), but v0.1 is useful as-is.
Happy to iterate on scope, naming, or the default policy coverage.
Thanks for the governance category in #484 — made this contribution
possible as a proper shelf-mate rather than a category-of-one.
cc @wshobson