feat(hooks): persistent predicate counter backend (successor to #3573)#3635
feat(hooks): persistent predicate counter backend (successor to #3573)#3635zmanian wants to merge 15 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a design document for a persistent predicate counter, outlining the PredicateStateBackend trait and its implementations for Postgres and libSQL to ensure rate-limit state survives process restarts. The review feedback highlights several areas for refinement: resolving the discrepancy regarding run_id storage in the schema, adopting chrono::DateTime<Utc> for consistency with the existing codebase, addressing libSQL-specific data type requirements for decimal precision, and clarifying the consistency guarantees when balancing cross-process synchronization with batched write performance.
| must NOT double-count. The backend stores `(timestamp, run_id, | ||
| event_id)` so duplicate-event detection works at replay time. |
There was a problem hiding this comment.
The design doc mentions that the backend stores run_id, but this field is missing from the PredicateStateBackend trait methods and the proposed SQL schema. To maintain clarity between the current implementation state and future or planned states, either add run_id to the trait signatures and database tables if it is intended for deduplication/auditing, or remove the reference from the documentation.
References
- In documentation, clearly distinguish between the current implementation state and future or planned states.
There was a problem hiding this comment.
Addressed in c9c5fb0 / b4d8a35 / 5b8b8b… across the doc revisions. run_id is intentionally NOT carried by the trait or the schema — see lines 72-78 of the doc:
The sliding-window state is per-tenant + per-hook-id; replay refusal is driven by
event_id(which isRuntimeEventIdfromironclaw_events, itself already keyed to the current run's emission). An earlier draft mentioned storingrun_idalongside; that's redundant given the event id's uniqueness contract and was removed in this revision.
The 'state' for clarity is: run_id is not stored; the design has shifted to a per-tenant + per-hook key with event_id as the dedup axis.
| pub trait PredicateStateBackend: Send + Sync { | ||
| async fn record_invocation( | ||
| &self, | ||
| key: &InvocationKey, // (tenant, hook_id, capability) | ||
| timestamp: SystemTime, | ||
| event_id: RuntimeEventId, | ||
| ) -> Result<(), PredicateBackendError>; | ||
|
|
||
| async fn count_in_window( | ||
| &self, | ||
| key: &InvocationKey, | ||
| window: Duration, | ||
| ) -> Result<u32, PredicateBackendError>; | ||
|
|
||
| async fn record_value( | ||
| &self, | ||
| key: &ValueKey, // (tenant, hook_id, capability, field) | ||
| timestamp: SystemTime, | ||
| value: Decimal, | ||
| event_id: RuntimeEventId, | ||
| ) -> Result<(), PredicateBackendError>; | ||
|
|
||
| async fn sum_in_window( | ||
| &self, | ||
| key: &ValueKey, | ||
| window: Duration, | ||
| ) -> Result<Decimal, PredicateBackendError>; | ||
|
|
||
| async fn evict_older_than(&self, cutoff: SystemTime) -> Result<u64, PredicateBackendError>; | ||
| } |
There was a problem hiding this comment.
The trait uses std::time::SystemTime, but the existing database layer in this repository (e.g., src/db/mod.rs) consistently uses chrono::DateTime<Utc>. Using DateTime<Utc> would be more idiomatic for this project and simplify integration with the existing persistence and migration logic.
pub trait PredicateStateBackend: Send + Sync {
async fn record_invocation(
&self,
key: &InvocationKey, // (tenant, hook_id, capability)
timestamp: DateTime<Utc>,
event_id: RuntimeEventId,
) -> Result<(), PredicateBackendError>;
async fn count_in_window(
&self,
key: &InvocationKey,
window: Duration,
) -> Result<u32, PredicateBackendError>;
async fn record_value(
&self,
key: &ValueKey, // (tenant, hook_id, capability, field)
timestamp: DateTime<Utc>,
value: Decimal,
event_id: RuntimeEventId,
) -> Result<(), PredicateBackendError>;
async fn sum_in_window(
&self,
key: &ValueKey,
window: Duration,
) -> Result<Decimal, PredicateBackendError>;
async fn evict_older_than(&self, cutoff: DateTime<Utc>) -> Result<u64, PredicateBackendError>;
}There was a problem hiding this comment.
Addressed in doc revision. The current trait sketch (line 41-49) uses DateTime<Utc> for the durable backend, and the rationale is called out explicitly on lines 63-70:
the in-memory backend that already shipped in PR #3635 uses
Instantbecause it's process-local and monotonic. Durable backends serialize across processes, so they must usechrono::DateTime<Utc>to match the rest of the project (src/db/mod.rs,ironclaw_events). The trait will acceptDateTime<Utc>and the existing in-memory backend will gain a thin shim mappingInstant-driven callers to a fixed reference point.
| capability text NOT NULL, | ||
| field text NOT NULL, | ||
| occurred_at timestamptz NOT NULL, | ||
| value numeric NOT NULL, |
There was a problem hiding this comment.
For the libSQL implementation, the value column should be TEXT to preserve rust_decimal precision, as noted in src/db/CLAUDE.md. The current Postgres schema uses numeric, which is correct for PG, but the design should acknowledge the dialect difference for libSQL to ensure parity across both backends.
There was a problem hiding this comment.
Addressed in doc revision (lines 120-128). The libSQL impl is now explicitly called out as having two material differences from the Postgres impl mandated by src/db/CLAUDE.md:
valuecolumn isTEXT NOT NULL(not numeric), because libSQL's integer/real types can't preserverust_decimalprecision; the backend serializes viaDecimal::to_string()/from_str()at the row boundary.occurred_atis stored as ISO-8601TEXT(libSQL convention).
| - Performance: every predicate evaluation becomes a DB round-trip. | ||
| Plan: batched writes via the dispatcher's tick boundary, with | ||
| reads cached for the current dispatch. |
There was a problem hiding this comment.
There is a potential conflict between "Cross-process consistency" (Requirement 1) and the performance plan of "batched writes". If writes are buffered until a dispatcher tick, a count_in_window call from another process might return stale data until the batch is committed. The design should clarify if eventual consistency is acceptable for these counters, or if synchronous writes are required for strict rate-limiting across multiple host processes.
There was a problem hiding this comment.
Addressed in doc revision (lines 149-163). You correctly flagged that batched writes conflict with the cross-process consistency requirement; the doc now resolves it explicitly:
Performance: every predicate evaluation becomes a DB round-trip. An earlier draft suggested batching writes at the dispatcher's tick boundary. That conflicts with requirement #1 (cross-process consistency): a deferred write from host A wouldn't be visible to host B's read until the next tick, so two concurrent hosts could each see 'under cap' simultaneously and both proceed past
max(gemini's review on the prior draft). Resolution: the v1 production backend keeps writes synchronous (read-your-own-writes within the call); the in-process cache stays per-dispatch-only. A future optimization could batch reads — collapse N predicate evaluations in one dispatch into a single batch SELECT — but never writes.
Successor PR from #3573. Current sliding-window state is in-memory and resets on restart. Adds a PredicateStateBackend trait + Postgres/libSQL impls for cross-process and restart-survival semantics.
6a6e0e0 to
d4dd47a
Compare
…ory impl Addresses codex review's three Critical findings on PR #3635: 1. Backend wiring: the trait is now registered (lib.rs:25-26) and PredicateEvaluator delegates to Arc<dyn PredicateStateBackend> via with_backend(...). Default constructor preserves the in-memory behavior so all 154 existing tests pass unchanged. 2. Atomic record-and-read: each record_invocation / record_value call performs the write AND returns the resulting in-window count/sum under a single mutex (in-memory) / transaction (durable backends). Splitting into separate record + read would let two hosts each see 'under cap' and both proceed, drifting past max. 3. Replay refusal: each record call carries a PredicateEventId. Re-emitting the same event_id is a no-op against the count. In-memory backend implements via a per-key bounded set (RECENT_EVENT_ID_CAP = 256); durable backends will use INSERT … ON CONFLICT DO NOTHING. Trait surface (predicate_state.rs): - PredicateEventId(String): opaque dedup key - PredicateBackendError: thiserror enum for fallible durable backends; in-memory backend never returns Err - PredicateStateBackend trait with Result return types - InMemoryPredicateStateBackend default impl - MAX_HISTORY_KEYS const re-exported via evaluator for back-compat Evaluator changes (evaluator.rs): - holds Arc<dyn PredicateStateBackend> (no more inline maps) - evictions_observed() reads through to backend - synth_event_id() generates per-call-unique ids via a process-local atomic counter so tests with identical (hook, ctx, now) still produce distinct ids - LRU helpers + HistoryKey/ValueHistoryKey types moved into predicate_state.rs (as InvocationKey/ValueKey) Tests: - 6 new predicate_state tests: - in_memory_invocation_counts_within_window - in_memory_invocation_trims_outside_window - in_memory_value_sums_within_window - in_memory_tenant_isolation (regression on threat-model C2) - in_memory_duplicate_event_id_is_a_noop_for_invocations - in_memory_duplicate_event_id_is_a_noop_for_values - 160 unit tests pass total. Reborn hooks_integration unchanged at 19 scenarios. Clippy/fmt/no-panics clean. Sync trait + Instant timestamps documented as a v1 choice; durable backends (Postgres, libSQL) will need an async companion trait using SystemTime — tracked in the scope doc as the next slice. Scope doc: crates/ironclaw_hooks/docs/successors/03-persistent-counter.md
Codex review addressed (commit d5f68de)All three Critical findings resolved:
Schema recommendation (event_id PK too broad): noted in the scope doc — the implementation slice for the Postgres backend will use Sync trait + Instant timestamps: kept for v1 to minimize call-site churn; durable backends will use a SystemTime-based async companion trait, tracked in the scope doc. Tests: 160 unit total (+6 new in
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d5f68de302
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| while bucket.recent_ids.len() > RECENT_EVENT_ID_CAP { | ||
| bucket.recent_ids.pop_front(); | ||
| } |
There was a problem hiding this comment.
Keep replay dedup IDs as long as window entries can survive
Capping recent_ids at 256 causes false non-idempotence under normal high-throughput keys: after >256 distinct events in the same window, replaying an older event_id (whose original timestamp is still in entries) is treated as new and increments again. That violates the backend’s replay-refusal contract and can inflate counts/sums enough to trigger incorrect deny/pause decisions. The dedup retention needs to track active-window entries (or equivalent durable seen-set semantics), not a fixed unrelated cap.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 9f9513c — dedup memory is now intrinsic to in-window entries. Each entry stores (timestamp, event_id), dedup is any(in-window entry has this id). The fixed-size 256 ring is gone, so dedup memory equals the in-window entry set — no silent loss possible under any throughput. New regression test dedup_memory_covers_full_window_under_high_throughput pushes 512 distinct events then replays event-0; pre-fix this would have re-counted, post-fix it's a no-op.
| .filter_map(|(k, v)| v.entries.front().map(|ts| (k.clone(), *ts))) | ||
| .min_by_key(|(_, ts)| *ts) | ||
| .map(|(k, _)| k); |
There was a problem hiding this comment.
Evict empty buckets when enforcing MAX_HISTORY_KEYS
The LRU victim search ignores buckets whose entries deque is empty, so once duplicate replays trim entries and skip re-insert, those keys become non-evictable. If enough such keys accumulate, record_* can hit len() >= MAX_HISTORY_KEYS, fail to find a victim, and still insert a new key, letting the map grow past the intended cap and breaking the D5 memory bound. Include empty buckets in eviction (or remove empty buckets immediately after trim/no-op).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 9f9513c. Two-part fix: (1) record_* now drops empty buckets eagerly via history.remove(key) (mostly defense-in-depth — under the new dedup design the record path can't actually leave a bucket empty). (2) evict_lru_* preferentially evicts empty buckets first, only falling back to the oldest-timestamp scan if no empty bucket exists. Regression test lru_evicts_empty_buckets_first crafts an empty bucket alongside a live one and asserts the empty one is the eviction victim.
Addresses codex P1 review on PR #3635: P1 #1 — replay dedup loss under high-throughput keys The prior design used a fixed-size (256) recent_ids ring per bucket decoupled from entries. Under any workload with >256 distinct events in the same window, the first event's id aged out of the ring while its timestamp entry was still live, so a replay silently re-counted. Fix: dedup memory is now intrinsic to entries. Each entry stores (timestamp, event_id), and the dedup check is 'does any in-window entry have this id?'. Dedup memory is therefore exactly the in-window entry set — no fixed cap, no silent loss. P1 #2 — zombie buckets clogging LRU Two-part fix: 1. record_* drops empty buckets eagerly via history.remove(key). This is mostly defense-in-depth — under the new dedup design, the record path can't actually leave a bucket empty (proved in the test rationale comment). 2. evict_lru_* now preferentially targets empty buckets first (find any v.entries.is_empty()), only falling back to the oldest-timestamp scan if no empty bucket exists. Filter-out behavior is gone, so any empty bucket that somehow survives becomes the next eviction victim instead of a permanent zombie. Test changes (+2 new, -0 removed): - dedup_memory_covers_full_window_under_high_throughput: pushes 512 distinct events into one bucket, then replays event-0. Pre-fix this would have counted again (silent dedup loss); post-fix the replay is a no-op. - lru_evicts_empty_buckets_first: crafts an empty bucket alongside a live one, runs LRU eviction, asserts the empty one is evicted and the live one retained. Tests: 162 unit total (+2 new). Clippy/fmt/no-panics clean.
Four medium-priority doc nits from gemini-code-assist on the crates/ironclaw_hooks/docs/successors/03-persistent-counter.md scope: 1. run_id in the trait: removed. The trait dedupes on event_id (RuntimeEventId is already run-scoped), not run_id. Replaces the earlier 'backend stores (timestamp, run_id, event_id)' claim. 2. SystemTime vs chrono::DateTime<Utc>: switched to DateTime<Utc> to match project convention (src/db/mod.rs, ironclaw_events). The in-memory backend keeps Instant for monotonic process-local semantics; durable backends require DateTime<Utc> for cross- process serialization. Documented as a clock note. 3. libSQL TEXT column for rust_decimal: per src/db/CLAUDE.md, libSQL can't preserve Decimal precision with numeric/real types. LibSqlPredicateStateBackend serializes value as TEXT via Decimal::to_string() / from_str(). Postgres impl keeps numeric (correct for PG). Documented as the two LibSql-specific schema differences. 4. Batched-writes vs cross-process consistency tension: gemini was right that deferring writes to the tick boundary breaks requirement #1 (two hosts would each see 'under cap' simultaneously). v1 production backend keeps writes synchronous; future optimization batches reads (not writes).
SummaryReviewed PR #3635 only. Base PR adds persistent predicate counter backend plumbing. In-memory backend looks concurrency-safe in process, but caller/public contracts do not yet support replay-safe durable semantics. Merge stance: blocking correctness finding plus public API concern. Findings
Security/data-flow notes
Correctness/invariant notes
Missing tests
|
henrypark133
left a comment
There was a problem hiding this comment.
What looks good:
- The backend trait cleanly separates predicate evaluation from state storage.
- The in-memory backend keeps record-and-read atomic within the process.
- The prior fixed-size dedup and empty-bucket LRU issues look addressed in the backend itself.
Findings:
- High -
crates/ironclaw_hooks/src/evaluator.rs:145:PredicateEvaluatorsynthesizes a freshPredicateEventIdon every evaluation by mixing in a process-local counter, so the production installed-hook path never forwards a stable event identity intoPredicateStateBackend. Why it matters: replay dedup is presented as load-bearing, but real retries/replays throughPredicateBackedBeforeCapabilityHook -> PredicateEvaluatoralways get a new ID and are counted again. Expected fix direction: thread a stable runtime/event identity through the hook context or add an evaluator entrypoint that accepts one, with caller-level tests proving duplicate event IDs are no-ops.
Summary:
- Recommended verdict: Request changes
- Prior feedback status: backend-internal Codex P1s appear fixed, but this end-to-end replay-dedup gap remains.
- Residual risk: durable Postgres/libSQL persistence is deferred, so cross-process and restart guarantees remain future work.
…y dedup) henrypark133 HIGH on PR #3635 + serrrfirat HIGH #1: the `PredicateBackedBeforeCapabilityHook -> PredicateEvaluator` path always synthesized a fresh `event_id` per evaluation by mixing in a process-local atomic counter, so the same logical invocation retried/replayed always got a different id. The backend's UNIQUE constraint on `event_id` — the load-bearing dedup contract — never engaged on the real production path. Replay dedup was effectively "documented but unused." Plumb a stable per-invocation identity through the public hook surface: - `BeforeCapabilityHookContext` gains a `caller_event_id: Option<PredicateEventId>` field. Middleware that threads through from the calling layer's runtime event identity populates `Some(...)`; older / in-memory-only callers pass `None` and degrade to the current synth path (no behavior change). - New builder method `with_caller_event_id(...)`. - `PredicateEvaluator` resolves the id through a new `resolve_event_id` helper: prefer `ctx.caller_event_id`, fall back to `synth_event_id`. Both `record_invocation` and `record_value` paths use it. - Backend dedup behavior is unchanged — it was already correct on `event_id`. The bug was the caller path never supplying a stable id. Tests (caller-boundary, henrypark133's required regression): - `duplicate_caller_event_id_is_deduped_in_invocation_count`: two evaluations with the same `caller_event_id` count as one invocation; a third with a different id counts as two; a fourth crosses the cap. Sanity branch confirms the no-id synth path still exhibits "every call counts" semantics. This is the API contract slice. Wiring the middleware to actually supply a stable id (e.g. derived from the originating `RuntimeEventId` once that runs through the BeforeCapability path) is the follow-up that lights up the durable backend's end-to-end replay-safety promise.
…at MED on PR #3635) serrrfirat MED: the `predicate_state` module exposed `PredicateStateBackend` as `pub`, but the trait's `now: Instant` parameter is process-local and not serializable. Any external durable backend impl built against the current trait would have to be rewritten when the durable contract lands with `chrono::DateTime<Utc>` (see successor doc 03-persistent-counter.md). Hold the public surface back until that contract is stable so we don't ship a public API we know we'll break. Demoted to `pub(crate)`: - `PredicateStateBackend` (trait) - `InvocationKey`, `ValueKey` (key types — backend ABI only) - `PredicateBackendError` (error type, with `#[allow(dead_code)]` on the `Unavailable` variant since the in-memory backend is infallible and no durable backend exists yet) - `InMemoryPredicateStateBackend` (the only impl) - `PredicateEvaluator::with_backend` (with `#[allow(dead_code)]` — reserved for future internal injection paths) Kept `pub`: - `PredicateEventId` — it appears on the public hook surface via `BeforeCapabilityHookContext::caller_event_id` (from the #3635 HIGH fix). Hook authors who want stable replay-dedup ids construct one. No behavior change. All 163 hooks lib tests + 19 reborn integration tests still pass.
Code Review — PR #3635 (persistent predicate counter backend)Verdict: REQUEST CHANGES — above-average quality with a credible extensibility story, but several correctness and safety issues need addressing before merge. OverviewExtracts sliding-window counter bookkeeping from IssuesMust fix before merge:
Important before durable backend PRs:
Non-blocking nits:
Missing test coverage
Security notes
|
Five items from the 5-15 review: **#1 (must-fix) O(n) dedup scan** The previous `bucket.entries.iter().any(...)` linear scan held the outer history mutex while walking thousands of in-window entries at high throughput. Add a companion `HashSet<PredicateEventId>` per bucket (`InvocationBucket.dedup_ids` / `ValueBucket.dedup_ids`), maintained alongside the deque via `pop_front`/`push_back` helpers. O(1) dedup, same correctness, same memory bound (one set entry per in-window entry — no fixed ring). **#2 (must-fix) Mutex poison cascade** `.expect("predicate history mutex poisoned")` propagated a panic to every subsequent caller. Replace with `match self.invocation_history.lock() { Ok(g) => g, Err(p) => p.into_inner() }` so a poisoning thread doesn't take down all subsequent evaluations. **#3 (must-fix) `caller_event_id` format validation** `with_caller_event_id` now rejects empty strings and ids containing NUL bytes. Failed validation logs a `tracing::warn!` and leaves `caller_event_id == None` so the synth path takes over — operator sees the warning, predicate dedup still works. Also: `PredicateEventId(pub String)` → `PredicateEventId(String)` with `new()` / `as_str()` (henrypark133 nit #9). Inner field is no longer in-place mutable from outside the crate. **#4 (must-fix) `with_backend` is `#[cfg(test)]`** Previously `#[allow(dead_code)]` — reachable from release builds and inviting future callers to inject backends through an unstable seam. Gated to `cfg(test)`. **#5 (important) `evict_older_than` trait stub** Default-impl no-op added to `PredicateStateBackend` so the trait signature is locked before the first durable-backend PR. Trait-object callers won't break when durable impls override it. **Bonus** (henrypark133 missing-coverage #1): `in_memory_record_invocation_is_atomic_under_concurrent_writers` — 32 threads each record a distinct event id; final count must equal 32, proving the atomic record-and-read contract holds under contention. **Bonus** (henrypark133 nit #10): The third stable id in `duplicate_caller_event_id_is_deduped_in_invocation_count` was 62 chars; bumped to 64 to match the synth output format.
serrrfirat
left a comment
There was a problem hiding this comment.
Findings:
-
High -
crates/ironclaw_hooks/docs/successors/03-persistent-counter.md:91,:103: the proposed durable schema usesevent_id uuid PRIMARY KEY, but the backend contract dedupes only within the same counter key.caller_event_idis per capability invocation, not per predicate hook. If two predicate-backed hooks observe the same invocation, the second hook's insert conflicts on the globalevent_idand does not count for its own(tenant, hook_id, capability[, field])bucket. That undercounts independent hooks and can let a later hook's cap fail open. Use composite uniqueness scoped to the counter key, e.g.(tenant_id, hook_id, capability, event_id)and(tenant_id, hook_id, capability, field, event_id), and align thepredicate_state.rsreplay-refusal docs with that scope. -
Medium -
crates/ironclaw_hooks/src/points/capability.rs:54,:105-114:caller_event_idis a public field, so the validation inwith_caller_event_id()is bypassable. Callers can construct a context and then assignSome(PredicateEventId::new(""))or a NUL-containing value directly;PredicateEventId::new()is intentionally permissive, and the evaluator trusts the field. Durable backends can still receive invalid IDs or accidentally dedupe unrelated events under an empty ID. Make the field private and require the validated setter, or makePredicateEventIdconstruction validated with an internal unchecked constructor for synthesized IDs.
I did not run tests; this was a review-only pass against head 7c83c3d14db6403fe3e549a32cf987a8c9b96aa3.
| hook_id bytea NOT NULL, | ||
| capability text NOT NULL, | ||
| occurred_at timestamptz NOT NULL, | ||
| event_id uuid PRIMARY KEY -- for replay dedup |
There was a problem hiding this comment.
The schema dedupes event_id globally, but the backend contract dedupes only within a counter key. Because caller_event_id is per capability invocation, two predicate-backed hooks observing the same invocation would collide here and the second hook would not count its own bucket. Please scope uniqueness to (tenant_id, hook_id, capability, event_id) for invocations and (tenant_id, hook_id, capability, field, event_id) for values, then align the replay-refusal docs with that scope.
There was a problem hiding this comment.
Addressed in b4d8a355f.
Schema PKs changed to composite uniqueness scoped to the counter key:
hook_invocation_events:PRIMARY KEY (tenant_id, hook_id, capability, event_id)hook_value_events:PRIMARY KEY (tenant_id, hook_id, capability, field, event_id)
The trait's replay-refusal docs in predicate_state.rs were rewritten to spell out the per-key scope explicitly and to specify the corresponding INSERT … ON CONFLICT (tenant, hook, capability[, field], event_id) DO NOTHING shape that durable backends must use. Two predicate hooks observing the same invocation now record into distinct buckets — no first-writer-wins undercounting.
| /// dedup degrades to "every evaluation counts" semantics — appropriate | ||
| /// for the in-memory backend without replay, but **not** for durable | ||
| /// backends. Durable callers MUST supply a value. | ||
| pub caller_event_id: Option<crate::predicate_state::PredicateEventId>, |
There was a problem hiding this comment.
This field is public, so callers can bypass the validation in with_caller_event_id() by assigning Some(PredicateEventId::new("")) or a NUL-containing value after construction. Since the evaluator trusts this field and PredicateEventId::new() is permissive, durable backends can still receive invalid IDs. Make the field private and force the validated setter, or validate construction at the PredicateEventId boundary with an internal unchecked constructor for synthesized IDs.
There was a problem hiding this comment.
Addressed in b4d8a355f.
Moved validation INTO the type boundary: PredicateEventId::new(...) -> Result<Self, PredicateEventIdError> validates non-empty + NUL-free at construction. The public field on BeforeCapabilityHookContext can no longer carry an invalid id because constructing any PredicateEventId value goes through the validating path.
For the evaluator's internal synth path and tests that mint ids from known-good shapes (e.g. blake3 hex digests), PredicateEventId::new_unchecked(...) is the explicit escape hatch — its name makes the bypass visible to reviewers.
with_caller_event_id dropped its now-redundant runtime check (the type already enforces it). Existing call sites switched to .expect("fixture passes validation") for the test fixtures and new_unchecked for the synth path.
Tests:
predicate_event_id_rejects_emptypredicate_event_id_rejects_nul_bytespredicate_event_id_accepts_typical_hex_digest
**MEDIUM — `caller_event_id` validation bypass**
`with_caller_event_id` validated for empty/NUL but the field on
`BeforeCapabilityHookContext` is `pub`, so callers could direct-
assign `Some(PredicateEventId::new("..."))` with `new()` permissive
and bypass the setter entirely. Move validation INTO the type
boundary:
- `PredicateEventId::new(...) -> Result<Self, PredicateEventIdError>`
validates non-empty + NUL-free at construction. Any value that
reaches a downstream backend now satisfies the format invariant by
construction.
- `PredicateEventId::new_unchecked(...)` for internal synth paths and
tests that mint ids from known-good shapes (hex digests).
- `with_caller_event_id` drops its now-redundant runtime check; the
type already enforces it.
- Internal synth in `evaluator.rs` switches to `new_unchecked` (64-char
hex output is always valid by construction).
Tests:
- `predicate_event_id_rejects_empty`
- `predicate_event_id_rejects_nul_bytes`
- `predicate_event_id_accepts_typical_hex_digest`
**HIGH — durable schema: dedup scope mismatch**
The successor doc's Postgres schema declared `event_id uuid PRIMARY KEY`
(globally unique), but the trait's replay-refusal contract dedupes
within the counter `key`. `caller_event_id` is per capability
invocation — two predicate-backed hooks observing the same invocation
share an id. A global PK lets the first hook's INSERT win and silently
undercounts the second hook's bucket.
- `docs/successors/03-persistent-counter.md`: PK changes to composite
`(tenant_id, hook_id, capability, event_id)` for invocations and
`(tenant_id, hook_id, capability, field, event_id)` for values,
matching the trait's per-key dedup scope.
- `predicate_state.rs` trait doc: replay-refusal section rewritten to
spell out the per-key scope and the corresponding
`INSERT … ON CONFLICT (tenant, hook, capability[, field], event_id)
DO NOTHING` shape durable backends should use.
henrypark133 / serrrfirat blocker B4 on PR #3635: the `caller_event_id` threading through `BeforeCapabilityHookContext` partially shipped earlier (commit b4d8a35), but the trust-boundary documentation explaining the host-assigned invariant was still missing. Add rustdoc to `PredicateEventId` and the `PredicateStateBackend` trait clarifying that: - the id MUST be minted by trusted host code from authoritative sources (dispatcher RuntimeEventId, host-side hash, arguments digest) - it MUST NOT pass through unchanged from any tenant-controlled surface (capability arguments, manifest fields, WASM memory, HTTP bodies) - the format invariants in `PredicateEventId::new` (non-empty, NUL-free) are a durability contract for SQL backends, NOT a trust check - a tenant-supplied id can either undercount itself into infinity by replaying a fixed id, or poison adjacent buckets if scoping is ever weakened Doc-only; no behavior change.
henrypark133 HIGH blocker B1 on PR #3635: replay dedup must engage at the caller boundary — `PredicateBackedBeforeCapabilityHook::evaluate` is the production path the dispatcher invokes for installed predicate hooks. A unit test on `PredicateEvaluator::evaluate_at` alone is insufficient regression coverage (repo CLAUDE.md rule "Test through the caller, not just the helper"): the wrapper hook reads `BeforeCapabilityHookContext::caller_event_id` and threads it down to the backend, so the regression test must drive the wrapper itself. The threading work already shipped in commit e6df47d (`caller_event_id` field on the public hook context + evaluator preferring it over the synth path). This commit adds the missing end-to-end test: 1. Two `PredicateBackedBeforeCapabilityHook::evaluate` calls with the same `caller_event_id` and a `RateOrValueCap { max: 1 }` predicate — the second call must stay under cap (dedupe engages at the wrapper boundary, not be re-counted into a deny). 2. A third call with a DISTINCT `caller_event_id` crosses the cap — proving dedup is replay-scoped (same id → no-op), not blanket- suppress (any id → no-op). If the wrapper were synthesizing a fresh id per call (the bug Henry flagged before threading landed), this test would fail at step 2 with the second evaluation being denied.
|
@henrypark133 thanks for the careful review — addressing your HIGH finding plus the joint blockers from Firat's pass. B1 (HIGH) — Replay-dedup gap through
|
henrypark133 should-fix S8 + S9 on PR #3635. S8 — threat-model expansion: - Add D5a as the correctness-under-attack variant of D5: an attacker flooding high-cardinality keys can LRU-evict legitimate tenants' counters and reset their rate-limit state. Distinct from the memory-only framing of D5; tied back to per-extension caps (D3/D4) and the durable-backend successor (doc 03). - Document the cross-process replay limit on the in-memory backend inside the PredicateStateBackend trait docs, not just in D5 — the process-local dedup is a property callers need at the trait surface, with a pointer to the durable backend as the cross-host story. S9 — three new tests on the in-memory backend public API: - lru_eviction_via_public_api_holds_max_history_keys_cap: drives MAX_HISTORY_KEYS + 1 distinct keys through record_invocation and asserts the map size cap holds + evictions_observed() advances. The previous coverage manually crafted buckets and called the LRU helper directly; this exercises the production path. - in_memory_invocation_retains_entry_at_exact_window_cutoff: pins the `< cutoff` trim semantics so a refactor to `<=` would fail loud. - event_id_dedup_is_isolated_across_invocation_and_value_maps: same event_id used in both record_invocation and record_value must not cross-suppress — the two maps key on disjoint types. The fourth S9 item (concurrent N-thread atomicity) and the caller- boundary replay test on the wrapper hook already landed in earlier commits (f632d22, predicate_state.rs line 840). S2 (evict_older_than stub), S3 (sync-trait docs), and S7 (consistency vs batched-writes) were also already in HEAD; this commit ships the remaining items. Quality gate: cargo fmt clean, cargo clippy -p ironclaw_hooks --all-features --tests -D warnings clean, full hooks test suite green (15 predicate_state unit tests + lib + integration). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t; pin synth format henrypark133 nits N1, N2, N5 on PR #3635. N1 — Move `synth_event_id` from `evaluator.rs` to `predicate_state.rs` as `PredicateEventId::synth(...)`. The id format (64-char lowercase hex, no NUL, never empty) is part of the backend's durable contract, so co-locating with `PredicateStateBackend` keeps the format change- surface adjacent to the consumer. To avoid inverting the module dependency (`predicate_state` is a leaf below `points`), the synth helper takes raw bytes / &str rather than a `&BeforeCapabilityHookContext`. The evaluator's `resolve_event_id` fallback unpacks the context and delegates. N2 — `// safety:` comment on a non-`unsafe` block (the `write!(s, "{byte:02x}")` infallibility note) renamed to `// RATIONALE:`. By convention `// SAFETY:` pairs with `unsafe` blocks; using `// safety:` elsewhere conflates the two. N5 — Add `synth_event_id_is_64_char_lowercase_hex` to pin the synth output shape. A refactor that silently changes length or case would break the durable backend's `uuid`-shaped UNIQUE constraint without a test failure today; the new test fails loud. Quality gate: cargo fmt clean, cargo clippy --all --benches --tests --examples --all-features -D warnings clean, full hooks lib test suite green (172 passing including the new pin). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Should-fix + nits follow-up — henrypark133 review (5-15)Status on the remaining items from your should-fix + nits list. All addressed; SHAs below. Should-fix
Nits
Quality gateEach commit: Branch: |
…fu-persistent-counter
The "No panics in production code" CI check (scripts/check_no_panics.py) only recognizes `// safety:` suppression markers, not `RATIONALE:`. Since std::fmt::Write for String is infallible, just discard the Result with `let _ =` instead of `.expect()` — no panic call, no marker needed. Also merges in latest origin/hooks-foundation-01 (now includes the reborn-integration merge and PR #3636).
Successor PR from #3573. Draft — scope doc only, no implementation yet.
Scope
Add a `PredicateStateBackend` trait + Postgres / libSQL impls so the sliding-window counter survives process restarts and is consistent across processes.
Design doc
`crates/ironclaw_hooks/docs/successors/03-persistent-counter.md`
Threat-model
D5 (eviction at MAX_HISTORY_KEYS = 8192) still applies in-memory; the durable backend is the source of truth.
Status
Draft for design review. Performance question (read/write batching) is the load-bearing design discussion.