Skip to content

feat: add review-agent-governance plugin (second inhabitant of governance category)#495

Merged
wshobson merged 5 commits intowshobson:mainfrom
tomjwxf:feat/review-agent-governance-plugin
Apr 28, 2026
Merged

feat: add review-agent-governance plugin (second inhabitant of governance category)#495
wshobson merged 5 commits intowshobson:mainfrom
tomjwxf:feat/review-agent-governance-plugin

Conversation

@tomjwxf
Copy link
Copy Markdown
Contributor

@tomjwxf tomjwxf commented Apr 17, 2026

Hi @wshobson, follow-up contribution to the governance category you
created when merging #484. This one targets a specific class of failure:
AI agents posting review-surface actions (PR reviews, comments, merges,
CI edits) without a human gate.

Context for why this plugin

Review bots posting hallucinated review comments under real accounts is
a pattern common enough to name. The damage is immediate and visible:
reviews show up under the account, merges happen that should not, CI
workflow files get rewritten. Most of these incidents would have been
prevented by a simple pattern: require a human-signal file to exist
before the agent is allowed to take review-surface actions.

This plugin implements that pattern with Cedar + Ed25519 receipts, so
the audit trail is cryptographic and the policy is declarative.

What's in the PR

plugins/review-agent-governance/
├── .claude-plugin/plugin.json
├── README.md                                       # motivation + full usage
├── skills/review-agent-setup/SKILL.md              # step-by-step setup
├── agents/review-policy-author.md      (sonnet)    # Cedar author for review policies
├── commands/
│   ├── approve-review.md                           # /approve-review "<reason>"
│   └── list-pending.md                             # /list-pending [--last N]
├── hooks/hooks.json                                # PreToolUse gate + PostToolUse sign
└── policies/review-agent-governance.cedar          # default policy (5 forbid rules)

Plus one entry added to .claude-plugin/marketplace.json under
category: "governance" (the category you created in #484). This turns
the governance category from a vanity shelf into a real one with two
complementary plugins.

What it forbids by default

The Cedar policy forbids (unless ./.review-approved is present):

  • gh pr review|comment|merge|close|edit, gh issue comment|close|edit,
    gh release create|edit, gh api repos
  • GitLab / Bitbucket equivalents (glab mr ...)
  • git push to main, master, release, production
  • Writes to .github/workflows/, .github/CODEOWNERS, .gitlab-ci.yml,
    .circleci/config.yml
  • WebFetch POSTs to api.github.com, api.gitlab.com, hooks.slack.com,
    discord.com

Non-review actions pass through unchanged. Installs cleanly alongside
protect-mcp (different receipt directory: ./review-receipts/ vs ./receipts/).

The approval flow

Flag file:

touch ./.review-approved
# agent makes the approved action
rm ./.review-approved

Slash command:

/approve-review "Approving LGTM on PR #42"

Creates the flag with the reason embedded, logs an approval receipt.

List denials:

/list-pending

Walks the receipt chain and shows recent decision: deny entries so
the human can see what the agent tried that was blocked.

Every action (approved or denied) produces an Ed25519-signed receipt.
The chain is verifiable offline via npx @veritasacta/verify ./review-receipts/*.json.

Why this is the right second inhabitant for the category

  1. Different class than protect-mcp. protect-mcp is general-purpose:
    any tool call, any policy. This plugin is focused: review-surface
    actions, with a known-good default policy and an explicit human-gate
    pattern. Users who need both install both; users who only need
    review gating install just this.

  2. Solves a common real pain. The review-bot failure class is
    something many teams have felt. A plugin that makes the fix one-line
    is genuinely useful.

  3. Cements the governance shelf. One plugin in a category reads as
    a vanity placement. Two that solve different problems reads as a
    real category.

Verification

  • python3 -m json.tool validates plugin.json, hooks.json, and
    the updated marketplace.json
  • Cedar policy is well-formed (9 rules; 4 forbid + 1 permit)
  • README and SKILL document the approval flow end-to-end
  • Slash commands are self-contained and runnable via Bash
  • Marketplace entry uses the same shape and naming convention as
    protect-mcp (category governance, keywords array, URL author)

Notes

  • The plugin depends on the same protect-mcp npm package as PR feat: add protect-mcp plugin — Cedar policies + signed receipts (closes #471) #484,
    so users who installed that one already have the runtime. No new
    dependencies.
  • Default receipts directory is ./review-receipts/ rather than
    ./receipts/ to avoid collision with protect-mcp when both are
    installed. Users can override via REVIEW_GOVERNANCE_RECEIPTS.
  • The file-flag approval pattern is deliberately simple for v0.1. A
    future v0.2 could add scoped approvals (e.g., "approved for next N
    review actions only"), but v0.1 is useful as-is.

Happy to iterate on scope, naming, or the default policy coverage.
Thanks for the governance category in #484 — made this contribution
possible as a proper shelf-mate rather than a category-of-one.

cc @wshobson

Copy link
Copy Markdown
Owner

@wshobson wshobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @tomjwxf — the problem framing (hallucinated review-bot actions) is real and the Cedar coverage of the review surface is thorough. A few things to address before this can land:

1. marketplace.json unicode regression (blocking). The diff rewrites literal UTF-8 characters (, , á) as ASCII escapes (\u2014, \u2192, \u00e1) on other, unrelated plugin entries (HADS, conductor, qa-orchestra, reverse-engineering, protect-mcp). The current file uses literal UTF-8 — please only append the new review-agent-governance entry and leave the surrounding descriptions untouched. This is almost certainly from a json.dump(...) round-trip; ensure_ascii=False fixes it, or just append the entry by hand.

2. approve-review argument handling (likely bug). commands/approve-review.md uses REASON="${1:-}", but slash-command bodies in this repo use $ARGUMENTS (see e.g. plugins/codebase-cleanup/commands/deps-audit.md after #490). With $1, /approve-review "long reason with spaces" captures only the first word.

3. Approval records are unsigned. The README at line 94 claims "The chain records exactly which actions were human-gated and when," but the approval entries under ./review-receipts/approvals/*.json are plain JSON — they don't flow through protect-mcp sign, so @veritasacta/verify won't cover them. Either sign the approval records too, or explicitly clarify in the README that the signed chain only covers tool-call receipts and the approval log is trust-in-operator.

Non-blocking:

  • Document that when the approval flag is present, PreToolUse short-circuits and no Cedar/policy_digest field is attached to the resulting PostToolUse receipt. Auditors should expect that.
  • $REASON is embedded in a JSON body without escaping; a reason containing " or \ will break the JSON. Low severity.
  • Keywords-array reformatting for protect-mcp (single-line → multi-line) is cosmetic churn — please revert.

Also note that #496 touches the same marketplace.json and will conflict. Whichever you want to land first is fine, but they'll need to be rebased.

Happy to re-review once the blockers are addressed.

tomjwxf pushed a commit to tomjwxf/wshobson-agents that referenced this pull request Apr 17, 2026
Three blocking items + one non-blocking clarification from the review:

1. marketplace.json unicode regression FIXED. Reset the file to upstream
   HEAD, then inserted ONLY the review-agent-governance entry with a
   string-based append that preserves every existing UTF-8 character
   byte-for-byte. No entries other than the new one are modified.
   `grep -c '\\u' marketplace.json` returns 0 escape sequences.

2. approve-review.md $1 → $ARGUMENTS. The marketplace slash-command
   convention (per plugins/codebase-cleanup/commands/deps-audit.md from
   PR wshobson#490) is $ARGUMENTS, which captures the full argument including
   spaces. $1 only captured the first word. Also JSON-escape the reason
   before embedding in the approval record (via python3 json.dumps) so
   quotes, backslashes, and newlines do not break the JSON body. This
   resolves the non-blocking JSON-escape note too.

3. README honesty on the approval log. Previously claimed the chain
   "records exactly which actions were human-gated and when," which was
   overstating: approval log entries under ./review-receipts/approvals/
   are plain JSON, not signed. Rewrote that paragraph to explicitly
   separate the signed PostToolUse chain (covered by
   @veritasacta/verify) from the operator-trust approval log. Points
   users at protect-mcp sign directly if they need signed approval
   records for regulated environments.

4. Added an explicit note on what the signed chain covers when the
   approval flag is present: PreToolUse short-circuits without calling
   Cedar, so the downstream PostToolUse receipt has decision:allow but
   no policy_digest. Auditors walking the chain should expect this.
   Resolves the "document the short-circuit" non-blocking item.

Not addressed (pending Seth's follow-up):
- marketplace.json conflict with wshobson#496 will be resolved by rebase order
  (whichever merges first; the other rebases)

Tests:
- python3 -m json.tool validates marketplace.json, plugin.json, hooks.json
- grep -c '\\u' on marketplace.json = 0
@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 17, 2026

Thanks for the close read @wshobson — all three blockers land, and the non-blocking points were right too. Pushed 91c1b20 addressing them:

1. marketplace.json unicode regression — fixed. You were exactly right on the root cause (json.dump without ensure_ascii=False). Rather than rely on that flag, I reset marketplace.json to upstream HEAD and did a string-based insertion that only appends the new review-agent-governance entry. No other entries touched. Verified with grep -c "\\u" marketplace.json = 0. The protect-mcp keywords-array reformatting is also reverted as part of the reset.

2. approve-review arguments — fixed. Switched REASON="${1:-}" to REASON="$ARGUMENTS" to match the marketplace slash-command convention (saw the deps-audit example from #490). Also added a JSON-escaping pass via python3 -c "json.dumps(...)" before embedding the reason in the approval-record body, which resolves the non-blocking note about " / \ breaking JSON.

3. Approval records are unsigned — clarified in README. You were right that the previous wording overstated the cryptographic coverage. Rewrote the relevant paragraph to explicitly say:

  • Entries under ./review-receipts/approvals/*.json are plain JSON, not signed. They do not flow through protect-mcp sign. The approval log is operator-trust.
  • The signed tamper-evident chain is the PostToolUse tool-call receipts under ./review-receipts/*.json. Those are what @veritasacta/verify covers.
  • For regulated environments that need signed approval records too, pointed users at calling npx protect-mcp@latest sign directly.

This is the honest framing. I considered signing approval records as part of this fix but that changes the semantic model of what the approval command does (right now it is a session marker; signing it would make it a receipt in its own right), and that feels like a v0.2 design question rather than a v0.1 fix.

4. Approval-short-circuit documentation — added. New paragraph in the README explicitly notes that when the approval flag is present, PreToolUse exits 0 without running Cedar, so the downstream PostToolUse receipt has decision: allow but no policy_digest field. Auditors should expect that.

On the #496 conflict: yes, both PRs touch the same marketplace.json section. Happy to land whichever you prefer first and rebase the other; with the string-based insertion approach now, rebasing is mechanical.

On version pinning (raised on #494): good suggestion; I will open a small follow-up PR that pins protect-mcp@<specific-version> across all three governance plugins so the tamper-detection guard is reproducible. Will do that after these two are rebased cleanly.

Re-requesting review.

tomjwxf pushed a commit to tomjwxf/wshobson-agents that referenced this pull request Apr 17, 2026
1. marketplace.json unicode regression FIXED. Same fix as wshobson#495: reset
   the file to upstream HEAD and did a string-based append of only the
   new signed-audit-trails entry. No other entries are touched. Verified
   with grep on \\u escape sequences returning 0.

2. refs.arewm.com/agent-commit/v0.2 link verified. curl -sIL returns
   HTTP 200; the page is live and correct.

Non-blocking thought (standalone plugin vs skill under protect-mcp)
acknowledged; keeping as standalone for v0.1 since the user-evaluation
path ('read before commit to infrastructure') works better without
requiring protect-mcp install first. Happy to restructure if the
maintainer prefers the nested-skill shape.

Strict-form --fail-on-missing-policy true production example deferred
to a follow-up SKILL revision. Not in this fix since it would be new
content rather than a bug fix.
tomjwxf pushed a commit to tomjwxf/wshobson-agents that referenced this pull request Apr 18, 2026
Closes the version-pinning suggestion from @wshobson on wshobson#494. The
tamper-detection test in plugins/protect-mcp/test/run-tests.sh
previously called `npx protect-mcp@latest` and `npx @veritasacta/verify`
with no version constraint, meaning an upstream npm publish could
flip the test green or red without any repo-side signal. Pinning
eliminates that.

Changes
───────
- plugins/protect-mcp/hooks/hooks.json: 2 x
    protect-mcp@latest -> protect-mcp@0.5.5
  (PreToolUse evaluate + PostToolUse sign)

- plugins/protect-mcp/test/run-tests.sh:
    6 x protect-mcp@latest -> protect-mcp@0.5.5
    3 x @veritasacta/verify    -> @veritasacta/verify@0.3.0
  (the four PreToolUse test invocations, keygen, sign, plus the two
  verify calls in tests 7 and 8)

- Header comment at the top of run-tests.sh now mentions the pinned
  @veritasacta/verify version for clarity.

What is NOT pinned
──────────────────
README.md and SKILL.md references remain as `npx protect-mcp@latest`
and `npx @veritasacta/verify`. Those are documentation of the pattern
a user should use in their own project, and "latest" is the right
advice for that audience. The test infrastructure is the only
executed path where pinning matters for reproducibility.

How to bump
───────────
When you want to update (e.g., protect-mcp publishes 0.6.0 with a
breaking change to --input handling), update both files together:

  perl -i -pe 's/protect-mcp\@0\.5\.5/protect-mcp\@0.6.0/g' \
    plugins/protect-mcp/hooks/hooks.json \
    plugins/protect-mcp/test/run-tests.sh

Then re-run ./plugins/protect-mcp/test/run-tests.sh to confirm the
tamper-detection guard still passes before merging.

This PR does NOT touch review-agent-governance or signed-audit-trails
because those plugins are still in review (wshobson#495, wshobson#496). Once they
land, a follow-up PR will pin them too.

Tests
─────
- python3 -m json.tool hooks.json passes
- bash -n run-tests.sh passes
- No other files touched; no marketplace.json changes (avoids conflict
  with wshobson#495 and wshobson#496)
@wshobson
Copy link
Copy Markdown
Owner

Thanks for this — the plugin is in good shape overall. Before merging, two items to address:

1. Pin protect-mcp in hooks/hooks.json (blocking)

Both hook commands use npx protect-mcp@latest, which reintroduces the exact reproducibility gap that #497 just closed for the protect-mcp plugin itself. Please pin to protect-mcp@0.5.5 (matching #497) in both the PreToolUse and PostToolUse hooks in plugins/review-agent-governance/hooks/hooks.json. Also worth pinning the @veritasacta/verify references in README.md / SKILL.md that appear in executed example snippets — at your discretion on which are documentation-pattern vs. runtime.

2. Rebase on main (blocking)

Now that #496 merged, this PR conflicts at the marketplace.json insertion point. Please rebase and resolve — same string-append approach you used on the #495 fix commit should work.

Non-blocking observation

The governance category is now populated entirely by your plugins (protect-mcp, signed-audit-trails, and this one). That's fine given the category was explicitly created for this class of tooling, but worth naming: if future contributors want to add governance-adjacent plugins that don't use the protect-mcp / Cedar / Ed25519 stack, we'll want the category description to remain welcoming to those. No action needed from you on this PR.

Happy to merge once the pin lands and the rebase is clean.

@tomjwxf tomjwxf force-pushed the feat/review-agent-governance-plugin branch from 91c1b20 to 9da068d Compare April 19, 2026 00:10
tomjwxf pushed a commit to tomjwxf/wshobson-agents that referenced this pull request Apr 19, 2026
Three blocking items + one non-blocking clarification from the review:

1. marketplace.json unicode regression FIXED. Reset the file to upstream
   HEAD, then inserted ONLY the review-agent-governance entry with a
   string-based append that preserves every existing UTF-8 character
   byte-for-byte. No entries other than the new one are modified.
   `grep -c '\\u' marketplace.json` returns 0 escape sequences.

2. approve-review.md $1 → $ARGUMENTS. The marketplace slash-command
   convention (per plugins/codebase-cleanup/commands/deps-audit.md from
   PR wshobson#490) is $ARGUMENTS, which captures the full argument including
   spaces. $1 only captured the first word. Also JSON-escape the reason
   before embedding in the approval record (via python3 json.dumps) so
   quotes, backslashes, and newlines do not break the JSON body. This
   resolves the non-blocking JSON-escape note too.

3. README honesty on the approval log. Previously claimed the chain
   "records exactly which actions were human-gated and when," which was
   overstating: approval log entries under ./review-receipts/approvals/
   are plain JSON, not signed. Rewrote that paragraph to explicitly
   separate the signed PostToolUse chain (covered by
   @veritasacta/verify) from the operator-trust approval log. Points
   users at protect-mcp sign directly if they need signed approval
   records for regulated environments.

4. Added an explicit note on what the signed chain covers when the
   approval flag is present: PreToolUse short-circuits without calling
   Cedar, so the downstream PostToolUse receipt has decision:allow but
   no policy_digest. Auditors walking the chain should expect this.
   Resolves the "document the short-circuit" non-blocking item.

Not addressed (pending Seth's follow-up):
- marketplace.json conflict with wshobson#496 will be resolved by rebase order
  (whichever merges first; the other rebases)

Tests:
- python3 -m json.tool validates marketplace.json, plugin.json, hooks.json
- grep -c '\\u' on marketplace.json = 0
@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 19, 2026

@wshobson no worries, both blockers addressed in 9da068d (force-pushed).

  1. protect-mcp@0.5.5 pinned in both hooks. Matching chore(protect-mcp): pin protect-mcp@0.5.5 and @veritasacta/verify@0.3.0 #497's pin.
  2. Rebased on current main. Clean merge, no conflicts at marketplace.json
    since the string-append approach preserves the rest of the file.

The README and SKILL.md references to @veritasacta/verify stay unversioned where they document the CLI as a general pattern ("run npx @veritasacta/verify against your receipts"), since those are documentation-pattern rather than runtime invocations. If you'd prefer them pinned too, happy to follow up.

On the non-blocking observation about governance category, agreed 100%. If future contributors add governance plugins that choose something other than the protect-mcp / Cedar / Ed25519 stack, the category description at least does not prescribe a specific stack. I canrevise the marketplace.json category description to make that welcomeness explicit if a non-ScopeBlind governance submission lands.

Follow-up PR if useful - Re-requesting review. Thanks for feedback!

@tomjwxf tomjwxf force-pushed the feat/review-agent-governance-plugin branch from 9da068d to 75aca45 Compare April 20, 2026 17:26
tomjwxf pushed a commit to tomjwxf/wshobson-agents that referenced this pull request Apr 20, 2026
Three blocking items + one non-blocking clarification from the review:

1. marketplace.json unicode regression FIXED. Reset the file to upstream
   HEAD, then inserted ONLY the review-agent-governance entry with a
   string-based append that preserves every existing UTF-8 character
   byte-for-byte. No entries other than the new one are modified.
   `grep -c '\\u' marketplace.json` returns 0 escape sequences.

2. approve-review.md $1 → $ARGUMENTS. The marketplace slash-command
   convention (per plugins/codebase-cleanup/commands/deps-audit.md from
   PR wshobson#490) is $ARGUMENTS, which captures the full argument including
   spaces. $1 only captured the first word. Also JSON-escape the reason
   before embedding in the approval record (via python3 json.dumps) so
   quotes, backslashes, and newlines do not break the JSON body. This
   resolves the non-blocking JSON-escape note too.

3. README honesty on the approval log. Previously claimed the chain
   "records exactly which actions were human-gated and when," which was
   overstating: approval log entries under ./review-receipts/approvals/
   are plain JSON, not signed. Rewrote that paragraph to explicitly
   separate the signed PostToolUse chain (covered by
   @veritasacta/verify) from the operator-trust approval log. Points
   users at protect-mcp sign directly if they need signed approval
   records for regulated environments.

4. Added an explicit note on what the signed chain covers when the
   approval flag is present: PreToolUse short-circuits without calling
   Cedar, so the downstream PostToolUse receipt has decision:allow but
   no policy_digest. Auditors walking the chain should expect this.
   Resolves the "document the short-circuit" non-blocking item.

Not addressed (pending Seth's follow-up):
- marketplace.json conflict with wshobson#496 will be resolved by rebase order
  (whichever merges first; the other rebases)

Tests:
- python3 -m json.tool validates marketplace.json, plugin.json, hooks.json
- grep -c '\\u' on marketplace.json = 0
@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 20, 2026

Rebased onto current main, marketplace.json conflict with #496 resolved. Kept the signed-audit-trails entry and added review-agent-governance after it with single-line keywords format to match repo convention. Tweaked the description to "Joins protect-mcp and signed-audit-trails in the governance category" rather than the original "Second inhabitant" since #496 landed first. Ready for re-review when you have a moment.

Copy link
Copy Markdown
Owner

@wshobson wshobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-ups — the three blockers from my earlier review are all addressed (marketplace.json unicode, $ARGUMENTS handling, and the README clarification on unsigned approval records). Nicely done.

One new blocker that surfaced from #503: hooks/hooks.json here uses "hook": {...} (singular) inside each matcher, but current Claude Code expects "hooks": [...] (array). It's the same schema bug #503 fixes in protect-mcp, and the plugin will fail to load on current CC until it's converted. Easiest path: wait for #503 to land and rebase, or just preemptively switch both PreToolUse and PostToolUse blocks to the array form here.

Minor: the PR body says "9 rules; 4 forbid + 1 permit" but policies/review-agent-governance.cedar actually has 5 forbid + 1 permit = 6. The README's verbal coverage is correct, only the numeric description is off.

Happy to land this once the schema is fixed.

ZeroXLauren and others added 4 commits April 26, 2026 19:52
…lass)

Second inhabitant of the `governance` category. Addresses the failure mode
where an AI agent posts PR reviews, comments, merges, or edits CI config
without a human gate. Directly inspired by the Hermes-style incident
pattern where review-bot hallucinations produce account-linked damage.

Components
──────────
- plugin.json, README.md, skills/review-agent-setup/SKILL.md
- agents/review-policy-author.md (sonnet) — Cedar author specialized in
  review-surface gating across GitHub / GitLab / protected branches /
  CI paths / notification surfaces
- commands/approve-review.md — opens an approval window via ./.review-approved
  flag file, records the reason in ./review-receipts/approvals/
- commands/list-pending.md — walks the receipt chain to show recent denials
  (the set of actions the agent tried that were blocked)
- hooks/hooks.json — PreToolUse gate + PostToolUse sign
- policies/review-agent-governance.cedar — default Cedar policy with five
  forbid rules covering gh/glab review actions, protected-branch pushes,
  CI config paths, and WebFetch POSTs to hooks.slack.com / api.github.com

Behavior
────────
By default, the plugin forbids:
  - gh pr review|comment|merge|close|edit, gh issue comment|close|edit,
    gh release create|edit, gh api repos
  - GitLab and Bitbucket equivalents
  - git push to main|master|release|production
  - Writes to .github/workflows/, .gitlab-ci.yml, .circleci/config.yml,
    .github/CODEOWNERS
  - WebFetch POSTs to api.github.com / api.gitlab.com / hooks.slack.com /
    discord.com

Non-review actions pass through unchanged. Composes with protect-mcp for
general policy enforcement; configure separate receipt directories to
keep the chains distinct.

Approval pattern
────────────────
Human opens an approval window by creating ./.review-approved (or via
/approve-review "<reason>"). The PreToolUse hook short-circuits to
permit while the flag is present. Every action, approved or denied,
still produces an Ed25519 receipt, so the chain records exactly what
happened and under what approval.

Marketplace entry
─────────────────
Added under category: "governance" with seven discovery keywords. The
governance category now has two inhabitants (protect-mcp + this one),
which turns it from a vanity category into a real shelf.

Standards
─────────
- Ed25519 (RFC 8032), JCS (RFC 8785), Cedar (AWS)
- IETF draft-farley-acta-signed-receipts
- Uses protect-mcp as its evaluation/signing runtime
Three blocking items + one non-blocking clarification from the review:

1. marketplace.json unicode regression FIXED. Reset the file to upstream
   HEAD, then inserted ONLY the review-agent-governance entry with a
   string-based append that preserves every existing UTF-8 character
   byte-for-byte. No entries other than the new one are modified.
   `grep -c '\\u' marketplace.json` returns 0 escape sequences.

2. approve-review.md $1 → $ARGUMENTS. The marketplace slash-command
   convention (per plugins/codebase-cleanup/commands/deps-audit.md from
   PR wshobson#490) is $ARGUMENTS, which captures the full argument including
   spaces. $1 only captured the first word. Also JSON-escape the reason
   before embedding in the approval record (via python3 json.dumps) so
   quotes, backslashes, and newlines do not break the JSON body. This
   resolves the non-blocking JSON-escape note too.

3. README honesty on the approval log. Previously claimed the chain
   "records exactly which actions were human-gated and when," which was
   overstating: approval log entries under ./review-receipts/approvals/
   are plain JSON, not signed. Rewrote that paragraph to explicitly
   separate the signed PostToolUse chain (covered by
   @veritasacta/verify) from the operator-trust approval log. Points
   users at protect-mcp sign directly if they need signed approval
   records for regulated environments.

4. Added an explicit note on what the signed chain covers when the
   approval flag is present: PreToolUse short-circuits without calling
   Cedar, so the downstream PostToolUse receipt has decision:allow but
   no policy_digest. Auditors walking the chain should expect this.
   Resolves the "document the short-circuit" non-blocking item.

Not addressed (pending Seth's follow-up):
- marketplace.json conflict with wshobson#496 will be resolved by rebase order
  (whichever merges first; the other rebases)

Tests:
- python3 -m json.tool validates marketplace.json, plugin.json, hooks.json
- grep -c '\\u' on marketplace.json = 0
Resolves wshobson's blocker: npx protect-mcp@latest reintroduces the
reproducibility gap closed by wshobson#497 for the protect-mcp plugin. Pins
both PreToolUse and PostToolUse hook invocations to 0.5.5.
Migrates plugins/review-agent-governance/hooks/hooks.json from the
deprecated singular "hook": {...} shape to the current
"hooks": [{...}] array shape, matching the schema fix that landed
in wshobson#503 for plugins/protect-mcp/hooks/hooks.json.

Same conversion applied to both PreToolUse and PostToolUse blocks:

  Before: { "matcher": ".*", "hook": {"type": "command", ...} }
  After:  { "matcher": ".*", "hooks": [{"type": "command", ...}] }

Verified the resulting structure matches the post-wshobson#503 shape used by
plugins/protect-mcp/hooks/hooks.json byte-for-byte at the schema level.
Without this fix, current Claude Code reports
"path: hooks.PreToolUse.0.hooks - expected array, received undefined"
on plugin load.
@tomjwxf tomjwxf force-pushed the feat/review-agent-governance-plugin branch from 75aca45 to 02e77c5 Compare April 26, 2026 23:54
@tomjwxf
Copy link
Copy Markdown
Contributor Author

tomjwxf commented Apr 26, 2026

Thanks @wshobson, both items addressed in 02e77c5 (force-pushed).

1. hooks.json schema fix. Converted both PreToolUse and PostToolUse
blocks from the deprecated singular "hook": {...} shape to the current
"hooks": [{...}] array shape, matching the conversion #503 just landed
in plugins/protect-mcp/hooks/hooks.json. Confirmed the resulting
structure is byte-identical at the schema level to the post-#503
protect-mcp file (same key shape, same nesting depth, just the per-plugin
command strings differ). The if [ -f ... ] short-circuit and the
@0.5.5 pin both survived the conversion intact.

2. PR body rule-count correction. You're right, the policy file has
5 forbid + 1 permit = 6 rules, not the "9 rules; 4 forbid + 1 permit"
the original PR description claimed. The README's verbal coverage list
matches the actual rules, so this is purely a description-vs-file mismatch
in the PR body. I'll correct the body inline once we're closer to merge
so I'm not editing it under your in-flight review.

Also rebased onto current main as part of this push, so #503, #496, and
the recent supply-chain hardening from #499 are all in the base.
mergeStateStatus is back to clean.

CI is re-running on the force-push; will leave the change set as-is for
your re-review.


Follow-ups (no action needed on this PR; flagging in case any of them
are useful)
:

  • Welcoming category description. Picking up the non-blocking note
    about the category remaining open to non-Cedar / non-Ed25519
    governance-adjacent plugins: rather than wait for a non-ScopeBlind
    submission to trigger the change, I'm happy to send a small follow-up
    PR that softens the category description in marketplace.json to read
    as "cryptographic or procedural governance for AI agents (policy
    enforcement, signed audit trails, human-gate patterns, and related
    accountability tooling)" rather than the current more stack-specific
    framing. That preempts the situation where someone with a different
    stack reads the category as ScopeBlind-coded and goes elsewhere.

  • v0.2 scoped approvals. As mentioned in the original PR body, the
    current ./.review-approved flag pattern is intentionally simple for
    v0.1. v0.2 could add scoped approvals (/approve-review --tool gh --count 1 for a one-shot, or --until 5m for a time window) so the
    approval semantics get richer without changing the receipt format.
    Separate PR if you'd want it.

  • Pin @veritasacta/verify references in docs. The README and
    SKILL.md examples currently mention npx @veritasacta/verify without
    a pin. Given how strict you've been (correctly) about runtime pinning
    in chore(protect-mcp): pin protect-mcp@0.5.5 and @veritasacta/verify@0.3.0 #497 and on this PR, would you prefer those documentation snippets
    pinned to @veritasacta/verify@0.3.0 too, even though they're
    pattern-illustrative rather than executed by the plugin runtime? I
    initially treated them as documentation-pattern (where unversioned
    reads better as "this is the CLI you reach for") but if you'd prefer
    them pinned for consistency with the runtime story, that's a one-liner
    follow-up.

Happy to land any of the three above as separate PRs after this one
merges. None blocks the current change.

Update the "Composing with protect-mcp" example to use the current
hooks array schema and pin protect-mcp@0.5.5, matching the runtime
hooks.json. Prevents users from copy-pasting a config that current
Claude Code refuses to load.
Copy link
Copy Markdown
Owner

@wshobson wshobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after the rebase + schema conversion — all prior blockers resolved. CI is green, hooks.json matches the current array schema, protect-mcp@0.5.5 is pinned, marketplace.json appends cleanly.

One small doc drift I went ahead and fixed for you in 1ca21d8 (pushed to your branch via maintainer edits): the "Composing with protect-mcp" example block in skills/review-agent-setup/SKILL.md was still using the deprecated "hook": {...} singular schema and npx protect-mcp@latest (unpinned). A user copy-pasting that into their own hooks.json would have hit exactly the schema-load failure #503 fixed for the runtime side. Now matches the array form and the @0.5.5 pin.

Approving conditional on the eval/JSON checks passing on the new commit. Will merge once they're green.

The PR-body rule-count nit (5 forbid + 1 permit, not 9) is fine to clean up post-merge as you suggested.

@wshobson wshobson merged commit c15b108 into wshobson:main Apr 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants