You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Four stacked safety rails enforced server-side before any proactive mentor DM leaves the application:
Per-user min-interval. 24 hours between any two proactive DMs to the same user. Tightened to 18 hours for DAILY cadence users (so the daily cadence can fire on consecutive weekdays without falling foul of the floor). The interval is measured against the last proactive DM, not the last mentor turn — on-demand mentor (slash command, App Home CTA) is excluded.
Global kill switch. Spring property hephaestus.mentor.slack.dm.enabled. Off → every proactive DM is rejected with a structured INFO log carrying the rejection reason. On-demand mentor (slash command, App Home CTA) honors the same flag — the entire Slack DM mentor surface is one switch.
Dry-run mode. Per-workspace boolean. When on, all proactive DMs route to the workspace admin's DM with a "would have been sent to " envelope instead of the user. Dry-run does not consume the daily cap and does not increment the per-user interval; it is for pilot tuning.
A DmSafetyRailsGuard Spring bean is the single check point. Every proactive DM dispatch flows through guard.check(workspaceId, userId, surface) -> Decision where Decision is ALLOW / DENY(reason) / DRY_RUN_DELIVER(adminUserId). The shared budget in #1260 calls the guard; the scheduler in #1258 does not call it directly.
Why
Slack-channel reputation is a one-strike resource. A runaway loop, a misconfigured cadence, or a bug in the practice-finding delivery path could flood a workspace and turn Hephaestus into the bot admins mute. Stacking four independent rails means no single bug or misconfiguration can produce a flood; the kill switch is the operational lever; dry-run is the pilot lever; the cap is the steady-state ceiling; the interval is the per-user dignity floor.
An ArchUnit test asserts no proactive-DM code path bypasses the guard (mentor + reflection + practice-finding delivery packages are checked)
Per-user min-interval is enforced; an integration test fires two DMs 23 h apart and asserts the second is denied with MIN_INTERVAL reason; a DAILY-cadence user with 19 h gap is allowed
Per-workspace daily cap is enforced; an integration test fires 51 DMs in one UTC day to the same workspace and asserts the 51st is denied with WORKSPACE_DAILY_CAP reason
The kill switch (hephaestus.mentor.slack.dm.enabled) at off-state rejects all proactive DMs with KILL_SWITCH reason; on-demand mentor (slash command, App Home CTA) returns "Mentor temporarily unavailable"; an integration test toggles the property and asserts both paths
Dry-run mode routes DMs to the workspace admin's DM with the "would have been sent to " envelope; an integration test asserts the toggle does not consume the daily cap or the per-user interval and that the admin gets exactly one DM per would-be send
Workspace-admin overrides to the daily cap write a row to the integration audit log (feat(server): append-only integration audit log #1218); the row carries the previous + new value, the admin user id, and the timestamp
DmSafetyRailsGuardTest (unit) — each rail in isolation + interaction (min-interval + daily cap stacking).
DmSafetyRailsGuardIT — end-to-end from scheduler enqueue → guard → adapter; asserts the daily cap is per-workspace, not per-user.
A kill-switch toggle test that flips the Spring property at runtime.
A dry-run-mode test asserting admin DM receipt and absence of user-side DM.
Implementation notes
Min-interval state lives in a proactive_dm_audit(user_id, workspace_id, surface, sent_at, decision, reason) table; the guard's read path queries the most recent allowed row for (user_id, workspace_id). Denied attempts are written too — they are the evidence the rails are working and feed the metrics.
The kill switch is a @Value Spring property and a RefreshScope-equipped bean so an operator can flip it without restart (the Spring Boot Actuator refresh endpoint is gated). A property change is logged at WARN level with the previous + new value.
Part of #1204.
What ships
Four stacked safety rails enforced server-side before any proactive mentor DM leaves the application:
DAILYcadence users (so the daily cadence can fire on consecutive weekdays without falling foul of the floor). The interval is measured against the last proactive DM, not the last mentor turn — on-demand mentor (slash command, App Home CTA) is excluded.hephaestus.mentor.slack.dm.enabled. Off → every proactive DM is rejected with a structured INFO log carrying the rejection reason. On-demand mentor (slash command, App Home CTA) honors the same flag — the entire Slack DM mentor surface is one switch.A
DmSafetyRailsGuardSpring bean is the single check point. Every proactive DM dispatch flows throughguard.check(workspaceId, userId, surface) -> DecisionwhereDecisionisALLOW/DENY(reason)/DRY_RUN_DELIVER(adminUserId). The shared budget in #1260 calls the guard; the scheduler in #1258 does not call it directly.Why
Slack-channel reputation is a one-strike resource. A runaway loop, a misconfigured cadence, or a bug in the practice-finding delivery path could flood a workspace and turn Hephaestus into the bot admins mute. Stacking four independent rails means no single bug or misconfiguration can produce a flood; the kill switch is the operational lever; dry-run is the pilot lever; the cap is the steady-state ceiling; the interval is the per-user dignity floor.
Acceptance criteria
DmSafetyRailsGuardbean exists; the only call path for proactive DM dispatch (mentor + reflection + practice-finding delivery in feat(server): formative-feedback DM flow with provenance line #1269) goes throughguard.check(...)MIN_INTERVALreason; aDAILY-cadence user with 19 h gap is allowedWORKSPACE_DAILY_CAPreasonhephaestus.mentor.slack.dm.enabled) at off-state rejects all proactive DMs withKILL_SWITCHreason; on-demand mentor (slash command, App Home CTA) returns "Mentor temporarily unavailable"; an integration test toggles the property and asserts both paths/actuator/integrations(feat(server): per-integration health endpoint and structured-log MDC #1217):allow,deny_min_interval,deny_daily_cap,deny_kill_switch,dry_run_deliverTests to write
DmSafetyRailsGuardTest(unit) — each rail in isolation + interaction (min-interval + daily cap stacking).DmSafetyRailsGuardIT— end-to-end from scheduler enqueue → guard → adapter; asserts the daily cap is per-workspace, not per-user.Implementation notes
proactive_dm_audit(user_id, workspace_id, surface, sent_at, decision, reason)table; the guard's read path queries the most recent allowed row for(user_id, workspace_id). Denied attempts are written too — they are the evidence the rails are working and feed the metrics.@ValueSpring property and aRefreshScope-equipped bean so an operator can flip it without restart (the Spring Boot Actuator refresh endpoint is gated). A property change is logged at WARN level with the previous + new value.workspace_integration.installed_byfrom feat(server): append-only integration audit log #1218). If that user is no longer in the workspace, dry-run is degraded to "ALLOW with DRY_RUN flag in audit but no admin DM" and an admin-action row is created (feat(server): per-integration health endpoint and structured-log MDC #1217) — better degraded than off.Dependencies
Depends on #1258. Depends on #1217. Depends on #1218. Blocks #1260. Blocks #1269.