Skip to content

feat(server): shared notification budget across mentor and finding delivery #1260

@FelixTJDietrich

Description

@FelixTJDietrich

Part of #1204.

What ships

A single NotificationBudget service that allocates the per-workspace daily DM cap from #1259 across three proactive surfaces:

Per-day allocation logic:

Priority Surface Budget share
1 (highest) Practice-finding delivery DMs (live, not backfill) up to 60% of remaining budget
2 Reflection scheduler DMs up to 30% of remaining budget
3 Practice-finding delivery DMs (backfill-sourced from #1266) up to 10% of remaining budget, queued

The budget is consulted by every proactive dispatch path via budget.requestSlot(workspaceId, userId, surface, priority) -> Slot | DeniedReason. The slot is consumed only on successful DM send (not on enqueue) so retried sends from #1209's outbox don't double-count. A notification_budget_ledger(workspace_id, day_utc, surface, count, last_updated) table is the source of truth; rows accumulate per UTC day and the table self-prunes after 30 days.

The budget runs inside the DM safety rails guard envelope from #1259 — the rails check first (interval / kill switch / dry-run), then the budget allocates within whatever the rails permit. Budget denials are distinct from rails denials in the audit log + metrics.

Why

Three surfaces dispatching independently against the same per-workspace cap is the deadlock-or-flood failure mode the rails alone don't address. Without a shared budget, the first surface to fire each day eats the cap and the others go silent — or all three fire and exceed the cap because each was unaware of the others. The budget is the single accounting surface; priorities encode the pedagogical ordering ("a finding the learner triggered just now beats a reflection nudge").

Acceptance criteria

  • NotificationBudget bean exists with the requestSlot(...) surface; an integration test asserts the three surfaces share one ledger row per workspace per UTC day
  • Priority allocation is enforced; an integration test queues 30 reflection sends + 30 finding sends on the same workspace at the same instant and asserts findings consume their 60% share before reflections consume theirs
  • Backfill-sourced finding DMs from feat(server): initial Slack channel backfill via conversations.history #1266 are tagged with priority=3 and never exceed 10% of remaining budget; an integration test floods backfill events and asserts the cap is not breached even when no live events compete
  • On-demand mentor DMs from feat(server): Slack mentor entry points (App Home, slash command, CTA) #1257 are not counted against the budget; an integration test fires 100 slash-command-initiated DMs in one day and asserts the budget ledger is unchanged
  • Slot consumption is post-send, not pre-send; an integration test simulates a chat.postMessage failure and asserts the slot is not consumed (the send is retried via feat(server): Modulith Event Publication Registry as outbound outbox #1209's outbox and consumes the slot on success)
  • Budget denials produce a structured INFO log with the priority + surface + denial reason and increment a metric notification_budget.deny{surface, reason} exposed via feat(server): per-integration health endpoint and structured-log MDC #1217
  • The ledger table self-prunes rows older than 30 days via a @Scheduled housekeeping task

Tests to write

  • NotificationBudgetTest (unit) — priority allocation math, slot consumption post-send.
  • NotificationBudgetIT — three-surface fixture firing concurrently, ledger assertions.
  • NotificationBudgetOutboxIntegrationTest — failed send + retry consumes one slot, not two.
  • A test asserting on-demand mentor bypasses the budget.

Implementation notes

  • The budget does not own retry; the outbox in feat(server): Modulith Event Publication Registry as outbound outbox #1209 does. The budget interrogates the outbox to know whether a slot has been consumed yet. A send transitions enqueued -> sent only on outbox success; budget slot consumption fires on that transition.
  • Concurrency: the ledger row is updated via INSERT ... ON CONFLICT (workspace_id, day_utc, surface) DO UPDATE SET count = count + 1 WHERE count < <share> so two concurrent sends do not both succeed past the share. The WHERE clause is the race-safe gate.
  • Priority shares are workspace-admin tunable via a workspace_integration.config_jsonb.notification_budget payload; defaults are 60 / 30 / 10. The integration UI surface lives in a follow-up; this sub-issue ships the defaults and the JSONB read path.
  • The budget does not differentiate practice-finding delivery DMs by practice; an aggressive practice that fires many findings consumes its surface's share and that's it. Per-practice rate limits are out of scope; the multi-actor practice epic's idempotency keys are the upstream rate limiter.

Dependencies

Depends on #1259. Depends on #1209. Blocks #1269.

Metadata

Metadata

Assignees

No one assigned

    Labels

    application-serverSpring Boot server: APIs, business logic, databasefeatureNew feature or enhancementpriority:highAddress this sprint - Significant impact

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions