Skip to content

feat(hooks): event-triggered hooks Phase 5 (successor to #3573)#3640

Open
zmanian wants to merge 11 commits into
hooks-foundation-01from
hooks-fu-event-triggered
Open

feat(hooks): event-triggered hooks Phase 5 (successor to #3573)#3640
zmanian wants to merge 11 commits into
hooks-foundation-01from
hooks-fu-event-triggered

Conversation

@zmanian
Copy link
Copy Markdown
Collaborator

@zmanian zmanian commented May 14, 2026

Successor PR from #3573. Draft — scope doc only, no implementation yet.

Scope

New `EventTriggered` hook point that subscribes to the runtime event bus and reacts to durable `RuntimeEvent`s asynchronously, outside the inline dispatch tick. Observer-only by construction (no Allow/Deny/Patch decisions; sink mirrors `ObserverSink`).

Motivation

Inline hook points are synchronous against the loop. Some legitimate hook use cases don't fit:

  • Cross-run policy enforcement ("ext-A made >10 trades in 24h across all runs")
  • Asynchronous notifications (Slack ping on `HookFailed` shouldn't block the loop)
  • Post-hoc fan-out (embedding index, search log, downstream policy engine)

Design doc

`crates/ironclaw_hooks/docs/successors/04-event-triggered-hooks.md`

Open design questions

  • Cursor / replay semantics: at-least-once via durable cursors; exact-once is a future slice
  • Per-hook event-rate caps to mitigate subscription DoS
  • Cross-crate seam: narrowed `HookObservableEvent` projection vs full `RuntimeEvent` re-export

Coordination

Status

Draft for design review. Promote when the cursor/replay + DoS-budget design lands.

Successor PR from #3573. Adds a new EventTriggered hook point that
subscribes to RuntimeEvents asynchronously, outside the loop's
inline tick. Observer-only by construction (no Allow/Deny/Patch);
typed against a narrowed HookObservableEvent projection to keep
the cross-crate boundary clean.

Scope doc only; design questions about cursor/replay semantics
and per-extension event-rate caps need design review before
implementation.
@github-actions github-actions Bot added size: XS < 10 changed lines (excluding docs) risk: low Changes to docs, tests, or low-risk modules scope: docs Documentation contributor: core 20+ merged PRs labels May 14, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design document for event-triggered hooks, which are asynchronous, observer-only hooks designed to react to runtime events without blocking the main loop. The review feedback highlights several internal inconsistencies and contradictions within the document that need clarification. Specifically, the reviewer noted discrepancies regarding the use of narrowed event projections versus the full RuntimeEvent type, the available methods on the ObserverSink trait, the policy for replaying missed events during downtime, and the current state of crate dependencies.

```rust
// in ironclaw_hooks::points::event_triggered
pub struct EventHookContext<'a> {
pub event: &'a RuntimeEvent,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code snippet uses the full RuntimeEvent type, which contradicts the recommendation in the Cross-cutting constraints section (line 105) to pick a narrowed projection (e.g., HookObservableEvent). Using a narrowed projection is preferred to maintain clean crate boundaries. Documentation for complex logic must precisely match the intended implementation.

References
  1. Documentation for complex logic, such as security policies, must precisely match the code implementation.

pub tenant_id: TenantId,
}

// Sink mirrors ObserverSink — `note_fact`, `emit_audit`. No `allow`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation mentions note_fact and emit_audit as methods on the sink, but the existing ObserverSink trait only defines a note method. Align this documentation with the actual trait definition or clearly distinguish these as future or planned states. Documentation must precisely match the code implementation and distinguish between current and future states.

Suggested change
// Sink mirrors ObserverSink — `note_fact`, `emit_audit`. No `allow`,
// Sink mirrors ObserverSink — note. No allow,
References
  1. Documentation for complex logic, such as security policies, must precisely match the code implementation.
  2. In documentation, clearly distinguish between the current implementation state and future or planned states.

(tick-driven or stream-driven, TBD).
- **Cursor / replay**: subscriptions are cursor-keyed so a restarted
host can resume from the last-seen `EventCursor`. Lost events
during downtime is acceptable for observer-only semantics;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a contradiction between the statement that Lost events during downtime is acceptable and the later requirement for Cursor resume (line 133) which specifies replaying missed events. The design should clarify which semantic is intended to ensure the documentation precisely matches the implementation logic.

References
  1. Documentation for complex logic, such as security policies, must precisely match the code implementation.


## Cross-cutting constraints

- **Cross-crate boundary**: `ironclaw_hooks` already forbids `events`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The statement that ironclaw_hooks already forbids a dependency on events is contradicted by the Risk section (line 159), which notes that ironclaw_events is already a dependency. Clarify the constraint to ensure the documentation precisely matches the system architecture.

References
  1. Documentation for complex logic, such as security policies, must precisely match the code implementation.

Cites crates/ironclaw_hooks/docs/successors/04-event-triggered-hooks.md as the scope contract.

Adds the EventTriggered observer hook point, durable RuntimeEvent dispatch path, and Reborn pull-driven subscription wiring with caller-level coverage for matching, replay, scope filtering, observer-only authority, and backpressure.
@zmanian zmanian marked this pull request as ready for review May 14, 2026 17:06
@github-actions github-actions Bot added scope: dependencies Dependency updates size: XL 500+ changed lines and removed size: XS < 10 changed lines (excluding docs) labels May 14, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce610736dc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/ironclaw_hooks/src/dispatch.rs Outdated
Comment on lines +887 to +890
if !binding
.scope
.permits(binding.owning_extension.as_ref(), event.provider.as_ref())
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve OwnCapabilities matching for providerless hook events

Scope filtering for event-triggered hooks is based on event.provider, but RuntimeEvent::hook_failed and RuntimeEvent::hook_decision_emitted populate provider as None (see crates/ironclaw_events/src/runtime_event.rs constructors). As a result, Installed hooks using the default OwnCapabilities scope are always filtered out for those event kinds, so a hook-failure subscription silently never fires unless it is widened to Global/SameTenant. This breaks the expected default behavior for extension-scoped failure observers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in two commits on this branch:

  • d7f43a3ff — adds a hook_id-based fallback in scope_provider_for_runtime_event so that when event.provider is None on HookDispatched/HookDecisionEmitted/HookFailed events, the dispatcher resolves the owning extension by looking up event.hook_id against the registry's hex index. OwnCapabilities matching is preserved for the default Installed-hook case.
  • 4bb06c93a — the durable fix: plumbs owning_extension: Option<ExtensionId> end-to-end through LoopHostMilestoneKind::{HookDispatched, HookDecisionEmitted, HookFailed} and the RuntimeEvent::hook_* constructors, so event.provider is populated at emit time and the primary path resolves without any fallback. Checkpoint payloads round-trip unchanged (#[serde(default, skip_serializing_if = "Option::is_none")]).

Regression tests in crates/ironclaw_reborn/tests/hooks_integration.rs:

  • event_triggered_own_capabilities_matches_hook_failed_with_carried_provider (primary path)
  • event_triggered_own_capabilities_scope_resolves_hook_failed_owner_from_hook_id (legacy/None-provider fallback)

@serrrfirat
Copy link
Copy Markdown
Collaborator

Summary

Reviewed PR #3640 only. Base 5793e4d90e1316adb93ec9c7edf6511d85f8873e → head ce610736dc4d61f509e53aa952c7b97548377b37.

PR adds event-triggered hooks over durable event replay. Merge stance: blocking security/correctness findings at subscription boundary and replay/reentrancy behavior.

Findings

# Sev Category File:Line Issue Fix suggestion
1 High Security / Trust boundary crates/ironclaw_reborn/src/loop_driver_host.rs:1141, crates/ironclaw_reborn/src/loop_driver_host.rs:1185, crates/ironclaw_reborn/src/loop_driver_host.rs:1542, crates/ironclaw_hooks/src/dispatch.rs:874 EventTriggeredHookSubscription::new accepts caller-supplied EventStreamKey + ReadScope, the poller reads that stream/scope, and factory spawn labels dispatched hook context with run_context.scope.tenant_id. There is no validation that subscription stream/read scope matches the run scope. A caller wiring tenant A host to tenant B stream can cause hooks to observe B events while context claims tenant A. Bind subscription stream/read scope from run_context.scope inside the factory, or validate stream.matches(&run_context.scope) plus tightened ReadScope before spawn. Derive hook tenant from the event stream/scope, not an independent caller-supplied argument. Add negative integration test: mismatched tenant/user/agent stream or over-broad read scope must fail host build/spawn.
2 Medium Reliability / Persistence crates/ironclaw_reborn/src/loop_driver_host.rs:1210 A stale cursor / log compaction ReplayGap only logs a warning and then breaks the subscription loop. Event-triggered hooks silently stop for the rest of the run, so future security/audit hooks never dispatch and callers get no structured failure. Surface replay-gap as a host/milestone error and fail closed, or enter an explicit recovery/resync path with durable operator-visible state. Add caller-boundary test for stale cursor / ReplayGap.
3 Medium Recursion / Reentrancy crates/ironclaw_hooks/src/dispatch.rs:859, crates/ironclaw_hooks/src/sink.rs:349, crates/ironclaw_reborn/src/loop_driver_host.rs:1199 Event-triggered hooks are described as observer-only, but the subscription loop has no source tag, self-authored event suppression, or dedupe before redispatching matching events. The sink type only prevents gate/patch sink calls; it does not prevent trusted/builtin hook code with captured dependencies from appending another matching RuntimeEvent, causing self-triggering/event storms. Stamp event source metadata and suppress event-triggered re-entry from the same hook/run, or reject runtime-event emit capability in event-hook execution contexts. Add regression with a hook that emits a matching event and prove it does not infinitely redispatch.

Security/data-flow notes

  • Source: externally wired EventTriggeredHookSubscription { log, stream, read_scope, start_cursor }.
  • Sink: durable event replay → dispatch_event_triggered_at(...) → hook code.
  • Broken trust invariant: event stream identity and hook context tenant are supplied through separate paths and never bound together.

Correctness/invariant notes

  • loop_driver_host.rs:1185-1189: subscription reads caller-supplied stream/read scope.
  • loop_driver_host.rs:1199-1206: every replayed record dispatches to hooks.
  • loop_driver_host.rs:1210-1219: replay gap stops the consumer permanently.
  • loop_driver_host.rs:1542-1546: spawn passes run_context.scope.tenant_id as hook context tenant, independent of subscription stream.

Missing tests

  • Reject mismatched subscription stream/read scope vs run scope.
  • Replay gap/stale cursor must not silently kill hook delivery.
  • Self-emitting event hook must not recurse or storm.

Copy link
Copy Markdown
Collaborator

@henrypark133 henrypark133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What looks good:

  • Event-triggered hooks are observer-only at the sink/type boundary.
  • Subscription runs from the durable log on a background task, so event emitters are not blocked by hook execution.
  • Cursor replay, kind filtering, scope filtering, and task cleanup all have useful coverage.

Findings:

  1. High - crates/ironclaw_hooks/src/dispatch.rs:887: event-triggered scope filtering uses event.provider, but RuntimeEvent::hook_decision_emitted and RuntimeEvent::hook_failed set provider: None in crates/ironclaw_events/src/runtime_event.rs:550 and :576. Why it matters: installed hooks default to OwnCapabilities, so the common/default event-triggered hook silently never fires for the hook-failure/decision events this PR is explicitly meant to observe. Expected fix direction: carry the originating provider into hook milestone runtime events, or resolve provider during replay/dispatch before applying OwnCapabilities, with a regression test for OwnCapabilities + HookFailed.

Summary:

  • Recommended verdict: Request changes
  • Prior feedback status: current Codex P1 is confirmed.
  • Residual risk: failure is silent non-observation, not loop corruption, but it undermines the Phase 5 default use case.

zmanian added 5 commits May 14, 2026 14:30
henrypark133 HIGH + codex P1 on PR #3640: `OwnCapabilities`-scoped
event-triggered subscriptions silently never fired for
`HookFailed`/`HookDecisionEmitted`/`HookDispatched` events because
those `RuntimeEvent` constructors hardcoded `provider: None`. Since
Installed hooks default to `OwnCapabilities`, the very events that
Phase 5 was designed to observe (hook-failure / decision alerting)
never reached their default-configured subscriber.

A prior fix added a hook_id-based fallback in
`scope_provider_for_runtime_event` that resolves the owning extension
through the registry's hex index when `event.provider` is `None`. That
covers the case where the failing hook is still registered at replay
time, but the durable fix is to stamp the originating provider into
the event at emit time so the primary `event.provider` path resolves
without any fallback.

Plumbed `owning_extension: Option<ExtensionId>` end-to-end:
- `LoopHostMilestoneKind::{HookDispatched, HookDecisionEmitted,
  HookFailed}` gain the field (with
  `#[serde(default, skip_serializing_if = "Option::is_none")]` so
  pre-existing checkpoint payloads and the L3 schema-snapshot tests
  round-trip unchanged when no owner is set).
- `RuntimeEvent::hook_{dispatched, decision_emitted, failed}`
  constructors accept the owner and stamp it into `provider`.
- `milestone_events.rs` threads the field through the projection.
- `HookDispatcher::emit_dispatched/emit_decision` pass
  `binding.owning_extension.clone()` directly.
- `HookDispatcher::emit_failure` (no binding handy on the failure
  path) looks the owner up via the registry's existing
  `owning_extension_for_hook_hex` index.

Tests:
- `event_triggered_own_capabilities_matches_hook_failed_with_carried_provider`:
  primary-path regression — two `HookFailed` events with
  `provider: Some(ext_a|ext_b)` against an `OwnCapabilities`
  subscription scoped to ext_a; only the own-provider event fires
  and `event.provider == Some(ext_a)`.
- Existing `event_triggered_own_capabilities_scope_resolves_hook_failed_owner_from_hook_id`
  remains green: passes `None` for the new arg so the fallback path
  is still exercised for legacy payloads.

All other call sites updated to pass `None` (no owner available) or
the resolved owner where applicable.
…rfirat HIGH #1 on PR #3640)

`EventTriggeredHookSubscription` accepted a caller-supplied
`EventStreamKey` + `ReadScope` and used `run_context.scope.tenant_id`
as the hook context's tenant — with no validation that the two
agreed. A caller wiring tenant A's host with tenant B's stream would
cause hooks to observe B's events while the hook context claimed
tenant A. Cross-tenant trust-boundary break.

Add `EventTriggeredHookSubscription::validate_against_run_scope` and
call it from `build_text_only_host_with_capabilities` before
spawning. Validation:
- Stream `(tenant_id, user_id, agent_id)` must equal
  `(run_context.scope.tenant_id, thread_scope.owner_user_id,
  run_context.scope.agent_id)`.
- Thread without `owner_user_id` cannot bind any subscription — the
  user dimension is required to verify stream identity.
- Every `Some(want)` in `ReadScope` must equal the corresponding
  run/thread scope value (project/mission/thread). `None` is
  permissive (run scope owns the dimension authoritatively).

Failures surface as `RebornLoopDriverHostError::ScopeMismatch` with
a specific reason naming the offending dimension.

Tests:
- `event_triggered_subscription_with_foreign_tenant_stream_fails_host_build`
- `event_triggered_subscription_with_foreign_user_stream_fails_host_build`

The integration fixture's `ThreadScope` now sets
`owner_user_id: Some(...)` so it passes validation; previously it was
`None`, which the new check (correctly) refuses. Existing tests
continue to pass.
…rrfirat MED on PR #3640)

When the durable event log returned `EventError::ReplayGap`, the
event-triggered subscription's background task previously logged a
`tracing::warn!` and broke out of the poll loop — silently killing all
future hook event delivery for the run with no operator-visible signal.
A scoped audit hook that mattered to compliance would just stop, and
nobody downstream would know.

Surface the termination through the host's milestone sink:
- New `LoopDriverNoteKind::EventSubscriptionTerminated` variant.
- The subscription's `spawn`/`run` now takes the host's
  `Arc<dyn LoopHostMilestoneSink>` and the active `LoopRunContext`.
  On `ReplayGap`, it constructs a `DriverNote` milestone with that
  kind plus a `LoopSafeSummary` describing the gap, publishes it
  through the same sink that carries every other host milestone, and
  *then* breaks (fail-closed: the at-most-once contract is already
  broken; resuming from `earliest` would silently lose the gap).
- Log level bumped from `warn` to `error` to match the severity.
- A best-effort send: failures to publish the milestone are logged
  but do not stall the subscription teardown.

Tests:
- `event_triggered_replay_gap_emits_subscription_terminated_milestone`:
  appends 3 events, `truncate_before_or_at` to cursor 2 to force a
  replay gap, starts the subscription from cursor origin (now stale),
  and asserts a `DriverNote { kind: EventSubscriptionTerminated, .. }`
  shows up on the host's milestone sink within a 2s deadline.

Self-emit reentrancy (serrrfirat MED #3 on the same PR) is intentionally
not addressed here — that fix needs a design call (task-local re-entry
flag vs. removing RuntimeEvent emit capability from event-hook execution
contexts) and is a follow-up.
 on PR #3640)

A hook that subscribes to one of the hook-lifecycle event kinds
(`HookDispatched`/`HookDecisionEmitted`/`HookFailed`) with a scope
that matches its own provider would otherwise be dispatched for
events describing its OWN executions. The dispatcher emits those
events itself when running the hook, so a hook subscribing to
`HookFailed` with `OwnCapabilities` against its own extension would
fail → emit HookFailed → re-dispatch → fail → emit → … storm.

`dispatch_event_triggered_at` now skips events whose `event.hook_id`
equals the binding's own hook id when the event kind is a hook-
lifecycle kind (`is_hook_lifecycle_kind`). The check is intentionally
narrow:

- It only fires for hook-lifecycle events. Subscriptions to other
  event kinds are unaffected.
- It only suppresses literal self-observation; events about other
  hooks (even hooks from the same extension) still dispatch.

This does NOT cover the broader case of a hook that captures an
`Arc<DurableEventLog>` and mints arbitrary `RuntimeEvent`s from
inside its `observe()`. That requires architectural restriction on
what hook impls can capture — tracked separately as a follow-up.

Tests:
- `event_triggered_self_lifecycle_event_does_not_redispatch`: appends
  two `HookFailed` events with the same provider — one targeting the
  subscriber's own hook id, one targeting a different hook. Asserts
  only the OTHER hook's failure fires (proves the filter is narrow,
  not blanket).
@henrypark133
Copy link
Copy Markdown
Collaborator

Code Review — PR #3640 (event-triggered hooks, Phase 5)

Verdict: COMMENT (draft) — design and implementation are sound and well-structured. Several items should be resolved before promotion from draft.

Overview

New EventTriggered hook point backed by a pull-driven cursor consumer against the durable RuntimeEvent log. Observer-only by type construction — EventTriggeredObserverSink has no allow/deny/patch methods. EventTriggeredHookSubscription background task with RAII SubscriptionHandle, kind-filter and OwnCapabilities/Global scope enforcement, self-trigger guard, cursor-based at-least-once replay. +1968/-35 lines.

Issues

Should fix before promotion from draft:

  1. event_kind_filter = None on an EventTriggered binding silently matches nothing — add an invariant check at HookRegistry::insert_binding: point == EventTriggered implies event_kind_filter.is_some(), and vice versa. Without this, a binding registered with HookPointSpec::EventTriggered and no filter is a silent no-op.

  2. EventTriggeredHookSubscription derives Clone — cloning and spawning creates two consumers reading from the same start_cursor, dispatching each hook twice. Remove Clone or document the dual-consumer footgun prominently on the type. The factory code (subscription.clone().spawn(...)) is correct today but Clone on a spawn-semantics type is a footgun.

  3. Background task panic is silently swallowed — if run() panics, the task terminates with no milestone emitted. Wrap run() in catch_unwind inside spawn and emit EventSubscriptionTerminated on panic, as already done for the ReplayGap path.

  4. DoS-budget enforcement should be a prerequisite for Installed-tier event subscriptions, not a follow-up — mutual recursion between two event-triggered hooks (each subscribed to the other's HookFailed events) is unbounded. The existing per-hook timeout bounds invocation time but not the cycle depth. A finite dispatch budget per poll cycle is the required mitigation. Recommend making this a gate for Installed-tier, allowing only Builtin/Trusted until the budget design lands.

  5. Full RuntimeEvent surface exposed to Installed hooks — the design doc recommends a narrowed HookObservableEvent projection. Without it, if sensitive fields survive log sanitization, installed hooks see them. Make this a named tracked issue rather than an open design question.

  6. Replay semantics absent from the public API doc — "at-least-once; restarting from the same start_cursor replays all events" is documented in the design doc but not in EventTriggeredHookSubscription::new(). Move it there.

Non-blocking nits:

  1. // serrrfirat HIGH #1 on PR #3640 — author-internal tags should be replaced with // FIXME: / // NOTE(#3640): before promotion.
  2. wait_for_seen_events spin-polls with 10ms sleep — use tokio::sync::Notify for deterministic test signaling.
  3. EventTriggeredHookContext derives Clone but appears not to be cloned anywhere — remove the derive to shrink the API surface.
  4. #[allow(clippy::too_many_arguments)] on install_event_triggered — a small EventTriggeredHookConfig struct would eliminate the suppression and ease future field additions (e.g., per-hook rate cap).

Missing test coverage

  • ReadScope narrowed scope rejection (all tests use ReadScope::any())
  • agent_id mismatch rejection (only tenant_id and user_id tested)
  • Phase ordering within a single dispatch_event_triggered_at call
  • Poisoning propagation (poison-path after malformed implementation)
  • Hook timeout end-to-end
  • Multi-hook batch cursor ordering

Security

  • Observer-only invariant correctly type-enforced; compile-time test documents it ✅
  • Cross-tenant/cross-user validation runs at build time, not dispatch time ✅
  • Self-trigger guard is hook-id-exact — prevents direct self-trigger, but not mutual recursion between two hooks (see issue feat: Sandbox jobs #4 above)
  • SEC4 note: mechanically preventing a hook from emitting events via a captured DurableEventLog relies on caller discipline, not the type system — should be a named tracked issue
  • Milestone emission inside dispatch_event_triggered_at creates new log entries; a third hook subscribed to hook-A's HookDispatched events is not prevented by the self-trigger guard. Same DoS-budget mitigation applies.

Four items from the 5-15 review (#4 DoS budget and #5 narrowed
projection deferred — see below):

**#1 (should-fix) Invariant: EventTriggered ↔ event_kind_filter**
`HookRegistry::insert` now enforces the biconditional at install time:
an `EventTriggered` binding must declare an `event_kind_filter`
(otherwise the dispatcher's kind match would silently never fire — a
no-op binding), and conversely only `EventTriggered` bindings may
declare a filter (other points are kind-agnostic and would ignore the
field). Misconfigured bindings fail loud at install.

**#2 (should-fix) Remove `Clone` derive on EventTriggeredHookSubscription**
`Clone` on a spawn-semantics type was a footgun: external callers
cloning + spawning twice would create two consumers reading from the
same `start_cursor`, each dispatching every hook. Replace with an
explicit `clone_for_independent_spawn(&self)` method named verbosely
so the property is visible at the seam. Internal use updated in the
factory's host-build path; external callers can no longer accidentally
construct a dual-consumer pattern.

**#3 (should-fix) catch_unwind around the background `run()` task**
The subscription's tokio task body now runs inside
`AssertUnwindSafe(...).catch_unwind()`; a panic in `run()` emits the
same `EventSubscriptionTerminated` `DriverNote` milestone the
`ReplayGap` path already emits, instead of silently terminating with
no operator-visible signal.

**#6 (should-fix) Replay semantics in rustdoc on public API**
Added a "Replay semantics" section to `EventTriggeredHookSubscription`
rustdoc: at-least-once, caller-owned cursor persistence, the
restart-from-start_cursor replay pattern. Previously only in the
design doc; now load-bearing API contract is visible at the type.

**#4 (deferred) Per-hook DoS budget for Installed tier**
Henry's recommendation was to gate `Installed`-tier event-triggered
hooks entirely until the budget design lands, allowing only
Builtin/Trusted. That breaks 11 existing tests + the primary use
case. Instead: documented the existing first-line throttle
(`batch_limit` × `poll_interval`) as the current bound on indirect-
recursion fanout, and tracked the full per-hook rate cap with
poisoning + milestone-on-overrun as a follow-up. The self-trigger
guard (committed earlier in this PR) catches the most common direct
pattern; the throttle here bounds the indirect pattern until the
proper budget lands.

**#5 (deferred) Narrowed `HookObservableEvent` projection**
Would prevent full `RuntimeEvent` surface from reaching Installed-
tier hooks. Project-wide impact (events crate types, projection
glue). Tracked as a follow-up; the existing sanitized-event
projection bounds the surface to closed-vocab labels.

All 156 hooks lib + 30 reborn integration tests pass.
@zmanian
Copy link
Copy Markdown
Collaborator Author

zmanian commented May 15, 2026

@henrypark133 thanks for the review. Addressing the High finding (dispatch.rs:887OwnCapabilities silently never fires for HookFailed/HookDecisionEmitted because RuntimeEvent::hook_failed/hook_decision_emitted set provider: None).

Fixed in two commits, in the order you suggested:

  1. d7f43a3ff — resolve provider during dispatch. scope_provider_for_runtime_event falls back to a hook_id-based registry lookup when event.provider is None and the event kind is one of the hook-lifecycle kinds. OwnCapabilities matching is preserved for the default Installed-hook case.
  2. 4bb06c93a — carry originating provider into the event at emit time. LoopHostMilestoneKind::{HookDispatched, HookDecisionEmitted, HookFailed} gain an owning_extension: Option<ExtensionId> field; the RuntimeEvent::hook_* constructors stamp it into provider so the primary path resolves without fallback. Wire/checkpoint compatibility preserved via #[serde(default, skip_serializing_if = "Option::is_none")]; L3 schema-snapshot tests round-trip unchanged when the field is absent.

Regression tests in crates/ironclaw_reborn/tests/hooks_integration.rs:

  • event_triggered_own_capabilities_matches_hook_failed_with_carried_provider — primary path, two HookFailed events with provider: Some(ext_a|ext_b) against an OwnCapabilities subscription scoped to ext_a; only the own-provider event fires.
  • event_triggered_own_capabilities_scope_resolves_hook_failed_owner_from_hook_id — fallback path for legacy provider: None payloads.

Full cargo test -p ironclaw_hooks -p ironclaw_reborn green (156 hooks + 30 reborn integration tests).

Codex P1 on dispatch.rs:890 is the same finding; replied separately on that thread (#3640 (comment)).

zmanian added 2 commits May 15, 2026 07:49
Bundle three nit-tier review items into a single commit:

**#9 Replace author-internal tags with NOTE(#3640)**
The Phase-5 PR (#3640) had several `serrrfirat HIGH/MED #N on PR #3640`
comment tags in this PR's diff. These are review-internal scaffolding,
not load-bearing for future readers. Replaced with `NOTE(#3640)` in:

- crates/ironclaw_hooks/src/dispatch.rs (self-observation guard)
- crates/ironclaw_reborn/src/loop_driver_host.rs (scope validation,
  replay-gap milestone, subscription binding)
- crates/ironclaw_reborn/tests/hooks_integration.rs (three regression
  tests covering scope validation, self-observation suppression, and
  replay-gap surfacing)
- crates/ironclaw_turns/src/run_profile/host.rs
  (`EventSubscriptionTerminated` doc)

**#10 Replace 10ms spin-poll with tokio::sync::Notify**
`wait_for_seen_events` polled the shared `Mutex<Vec<SeenRuntimeEvent>>`
every 10 ms until the expected count was reached. Replaced with a
`SeenLog` newtype that pairs the events vec with a `Notify`; the
hook's `observe()` calls `seen.push(...)` which signals
`Notify::notify_one`, and `wait_for_seen_events` parks on
`notified().await` under a `tokio::time::timeout`. `notify_one` is a
permit-store, so an event landing between snapshot and wait still
wakes the waiter immediately. Test latency drops from ~10 ms median to
sub-ms and is no longer rate-limited by the polling cadence. All 30
hooks_integration tests still pass.

**#11 Remove unused Clone derive on EventTriggeredHookContext**
No call site clones the context — it's passed by reference. Dropped
the derive to make the borrow contract clearer.
…reality

Address gemini-code-assist review on `04-event-triggered-hooks.md`:

- L50 (Likely surface): annotated the sketch's full `RuntimeEvent` use
  with a pointer to the narrowed-projection follow-up so the snippet
  no longer reads as a recommendation contradicting L119–121.
- L55 (sink methods): replaced `note_fact` / `emit_audit` (which never
  shipped on `ObserverSink`) with the actual `note(category, summary)`
  primitive and cross-referenced Reborn's
  `EventTriggeredObserverSink`.
- L95 (cursor / replay): "lost events during downtime acceptable"
  contradicted the at-least-once replay semantics described in the
  Phase 5 implementation notes. Rewrote the bullet to say replay is
  at-least-once from the persisted cursor and to spell out the
  operator obligation around cursor persistence before shutdown.
- L100/115 (forbids events dep): the original doc claimed
  `ironclaw_hooks` forbids an `ironclaw_events` dep, but the Risk
  section noted the dep is already established via PR #3573. Updated
  both passages to reflect that the dep direction is set; Phase 5
  adds the *consumer* side. The narrowed `HookObservableEvent`
  projection is now framed as a follow-up tracked in #3690.
@zmanian
Copy link
Copy Markdown
Collaborator Author

zmanian commented May 15, 2026

@henrypark133 @serrrfirat addressing the 5/15 review batch. Status of each finding below.

serrrfirat (MED)

MED #2 (loop_driver_host.rs:1210, ReplayGap silently breaks): resolved earlier in this PR — b87809f5f. ReplayGap now emits an EventSubscriptionTerminated DriverNote milestone before breaking; regression test in event_triggered_subscription_replay_gap_emits_milestone_and_terminates.

MED #3 (no source tag / self-author suppression beyond hook-id exact match): resolved earlier in this PR — 357fb472f. Event-triggered hooks now suppress self-observation through extension/source matching (not just exact hook-id), preventing mutual-recursion storms within an extension. Regression: event_triggered_hook_skips_self_authored_lifecycle_events.

henrypark133

should-fix #1 (EventTriggeredevent_kind_filter invariant): resolved in a8292ea4d. HookRegistry::insert_binding now enforces the biconditional at install time; misconfigured bindings fail loud.

should-fix #2 (Clone on EventTriggeredHookSubscription): resolved in a8292ea4d. Clone derive removed in favor of an explicit clone_for_independent_spawn(&self) method, named verbosely so the dual-consumer property is visible at the seam.

should-fix #3 (panic in run() silently swallowed): resolved in a8292ea4d. Background task body now runs inside AssertUnwindSafe(...).catch_unwind(); on panic, emits the same EventSubscriptionTerminated milestone as the ReplayGap path.

should-fix #4 (per-hook DoS dispatch budget for Installed-tier event hooks): deferred to follow-up issue #3689. Henry's stricter recommendation was to gate Installed-tier event hooks entirely until the budget lands, which breaks the primary use case + 11 tests. Interim mitigations remain: the subscription throttle (batch_limit × poll_interval) bounds steady-state rate, and the self-author suppression (357fb472f) catches the most acute mutual-recursion pattern. Tracking issue for the proper budget: #3689.

should-fix #5 (narrowed HookObservableEvent projection): deferred to follow-up issue #3690. Existing event sanitization (sanitize_error_kind) bounds the surface to closed-vocab labels; the residual risk is event-correlation, not data leakage. Tracking issue: #3690.

should-fix #6 (replay semantics on EventTriggeredHookSubscription::new()): resolved in a8292ea4d. Added a "Replay semantics" section to the rustdoc on the public type stating at-least-once delivery, caller-owned cursor persistence, and the resume-from-start_cursor replay pattern.

nit #7 (author-internal tags): resolved in 75e67c832. serrrfirat HIGH/MED #N on PR #3640 tags introduced in this PR's diff replaced with NOTE(#3640) across dispatch.rs, loop_driver_host.rs, hooks_integration.rs, and run_profile/host.rs.

nit #8 (wait_for_seen_events 10ms spin-poll): resolved in 75e67c832. Introduced a SeenLog newtype wrapping the events vec + a tokio::sync::Notify. The recorder hook calls Notify::notify_one on push; wait_for_seen_events parks on notified().await under a tokio::time::timeout deadline. Sub-ms wake latency, still fails loud on timeout.

nit #9 (unused Clone on EventTriggeredHookContext): resolved in 75e67c832. Derive removed.

nit #10 (#[allow(clippy::too_many_arguments)] on install_event_triggered): intentionally not changed in this PR for consistency — install_observer has the same 7-arg shape with the same #[allow]. Refactoring just install_event_triggered would split the installer surface stylistically; the right scope is a single follow-up that introduces config structs for both installers together. Happy to file a separate issue if you'd like.

gemini-code-assist (doc fixes)

All four resolved in d23a71d2b on crates/ironclaw_hooks/docs/successors/04-event-triggered-hooks.md:

  • L50 (RuntimeEvent in snippet vs narrowed-projection recommendation): annotated the sketch with a pointer to the narrowed-projection follow-up.
  • L55 (nonexistent note_fact/emit_audit): replaced with the actual note(category, summary) primitive; cross-referenced Reborn's EventTriggeredObserverSink.
  • L95 ("lost events during downtime acceptable" vs cursor-resume): rewrote to state at-least-once replay from the persisted cursor + operator obligation to persist before shutdown.
  • L100/L115 ("forbids events dep" vs already-a-dep): updated both passages to reflect that the dep direction is established via PR feat(reborn): add ironclaw_hooks framework foundation (#3524) #3573; Phase 5 adds the consumer side. Narrowed HookObservableEvent is framed as a tracked follow-up (hooks: narrow RuntimeEvent to HookObservableEvent projection for Installed-tier event hooks #3690).

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thanks for the update, @zmanian. The resolutions for the identified issues and the follow-up tracking for the DoS budget and narrowed event projection look correct and address the concerns raised in the review. The use of clone_for_independent_spawn and the improved AssertUnwindSafe handling are good improvements for robustness. I've reviewed the implementation details in the latest commits and the integration tests; the fix for the OwnCapabilities provider resolution is sound. This PR is ready for final review.

Resolves conflicts from the latest base advance:
- crates/ironclaw_hooks/src/dispatch.rs: union HEAD's owning_extension
  plumbing with base's audit_reason field in emit_decision_with_audit
- crates/ironclaw_hooks/src/registry.rs: keep both new install-time
  validations — HEAD's event-kind-filter biconditional (henrypark133
  should-fix #1) and base's scope-vs-point coherence check. Extend
  point_has_capability_context to treat EventTriggered as
  capability-context-bearing (provider resolved per-event at dispatch
  time)
- crates/ironclaw_turns/src/run_profile/milestones.rs: union
  HookDecisionEmitted to carry both owning_extension AND audit_reason
- crates/ironclaw_reborn/src/milestone_events.rs: project owning_extension
  into RuntimeEvent::hook_decision_emitted, ignore audit_reason (it stays
  on the in-memory milestone sink, never crosses the durable boundary)
- crates/ironclaw_reborn/src/lib.rs: re-export
  EventTriggeredHookSubscription* alongside the base's new
  LoopCapabilityPortFactory
- crates/ironclaw_reborn/src/loop_driver_host.rs: keep both the new
  ProfiledCapabilityHostRuntime/LoopCapabilityPortFactory and the
  event-subscription handle field; drop unused tokio::sync::Notify
- crates/ironclaw_hooks/src/dispatch.rs: reject EventTriggered in the
  observer install path explicitly (it is installed via the event
  subscription path, not the generic observer entry point)
- propagate event_kind_filter: None to remaining HookBinding test
  literals exposed by the merge

henrypark133 #1 (HIGH): the originally-cited
`event.provider` always being `None` for hook milestone events is
already addressed on this branch — commit 4bb06c9 / d7f43a3 carry
owning_extension end-to-end through both `RuntimeEvent::hook_failed`
and `RuntimeEvent::hook_decision_emitted`. The merge preserves and
extends that plumbing rather than reverting it.

cargo fmt + clippy (-D warnings) green; unit and hooks_integration
tests green.
@zmanian
Copy link
Copy Markdown
Collaborator Author

zmanian commented May 17, 2026

@henrypark133 Rebased onto latest hooks-foundation-01 (now includes reborn-integration + #3636 DenyReasonCode). Pushed as 00d64d5a3.

Re: HIGH finding #1 (event.provider is None for HookDecisionEmitted/HookFailed, breaking OwnCapabilities event-triggered hooks):

This was already addressed on the branch in commits 4bb06c93a ("carry owning extension into hook milestone runtime events") and d7f43a3ff ("Fix hook event OwnCapabilities owner lookup"). Both RuntimeEvent::hook_decision_emitted and RuntimeEvent::hook_failed now take owning_extension: Option<ExtensionId> and set it as provider. The HookDecisionEmitted milestone variant carries owning_extension from the HookBinding; the failure path resolves it via the registry's hex index in HookDispatcher::emit_failure (since the failure record doesn't carry the binding directly).

The merge preserved that plumbing — I had to union it with the new audit_reason field on HookDecisionEmitted that landed in #3636 — and extended point_has_capability_context in registry.rs to treat EventTriggered as capability-context-bearing (provider is resolved per-event at dispatch time), so the existing scope-coherence validator doesn't reject EventTriggered + OwnCapabilities bindings. Hooks unit tests + hooks_integration integration tests stay green.

Could you take another look when you have a moment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: low Changes to docs, tests, or low-risk modules scope: dependencies Dependency updates scope: docs Documentation size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants