|
| 1 | +# Design: CanisterWorm Incident Response (Umbrella) |
| 2 | + |
| 3 | +| Field | Value | |
| 4 | +|-------|-------| |
| 5 | +| **Slug** | `canisterworm-incident-response` | |
| 6 | +| **Date** | 2026-04-23 | |
| 7 | +| **Status** | CRYSTALLIZED (WRS 100/100) | |
| 8 | +| **WRS** | 100/100 | |
| 9 | +| **Council** | [COUNCIL.md](../sec-scan-progress/COUNCIL.md) (10-perspective deliberation, 2026-04-23) | |
| 10 | + |
| 11 | +## Problem |
| 12 | + |
| 13 | +`@automagik/genie` itself is on the CanisterWorm/TeamPCP IOC list. The scanner that detects the compromise ships through the same npm pipe that was weaponized. Most developers running genie do NOT have EDR/MDM coverage that would act on a JSON report. Operators need the full incident-response kit — observable scanning, bounded walks, structured telemetry, signed releases, auditable remediation, and a tested runbook — the first time they reach for it. |
| 14 | + |
| 15 | +A single monolithic wish absorbing all of that was drafted and reviewed. Two independent reviewers (one self, one dispatched) returned BLOCKED citing: (H1) circular dependency between remediation and signing, (H2) three wish-sized scopes bundled into one long-lived branch, (H3) unmerged base branch `codex/sec-scan-command` amplifying rebase drift, plus missed gaps (G1: no bulk rollback, G2: offline credential rotation guidance, G3: quarantine disk-space, G4: `0600` mode theater, M2: `--unsafe-unverified` ack string undefined). Fixing structure in the monolith would preserve the single-PR narrative at the cost of a 6–8 week sequential branch. |
| 16 | + |
| 17 | +This umbrella replaces the monolith with **four sibling wishes** that deliver the same full scope, independently shippable, with a clean dependency graph. Nothing from the council recommendations is dropped; the delivery topology changes. |
| 18 | + |
| 19 | +## Scope |
| 20 | + |
| 21 | +### IN (shared across sibling wishes) |
| 22 | + |
| 23 | +- Full CanisterWorm incident-response posture as captured in COUNCIL.md — observability, bounded walks, load-bearing-code-only, versioned telemetry, detect-→-exec pathway, detect-→-remediate pathway, signed distribution, incident runbook. |
| 24 | +- Four sibling wishes with explicit cross-wish dependencies. |
| 25 | +- Shared architectural invariants (detect-only scanner; remediation as sibling CJS payload; quarantine-by-move; signed-channel for mutation; audit-log append-only). |
| 26 | + |
| 27 | +### OUT |
| 28 | + |
| 29 | +- Replacing the scanner with a daemon, worker pool, or TUI dashboard. |
| 30 | +- Expanding CanisterWorm/TeamPCP IOC coverage (separate wish if a new family lands). |
| 31 | +- Scanning every mounted filesystem by default. |
| 32 | +- Automated credential rotation against live cloud APIs (v1 emits commands only). |
| 33 | +- Network-delivered IOC list updates. |
| 34 | +- Destructive delete in quarantine. |
| 35 | + |
| 36 | +## Preconditions |
| 37 | + |
| 38 | +- ✅ **`codex/sec-scan-command` merged via PR #1348** (squash into `dev`, 2026-04-23T17:53Z). Scanner files live on `origin/main` and `origin/dev` at commit `3d7e6609`. Sibling wishes branch from `dev`. This removes the long-lived-branch risk the reviewer flagged as H3. |
| 39 | + |
| 40 | +## Sibling Wishes |
| 41 | + |
| 42 | +| Slug | Scope | Appetite | Depends on | |
| 43 | +|------|-------|----------|-----------| |
| 44 | +| [`sec-scan-progress`](../../wishes/sec-scan-progress/WISH.md) | Runtime context, versioned envelope, CLI, bounded walks, `dev:ino`, phase measurement, fs fingerprint, deletion pass, matcher collapse, phase registry, events file + redaction, persistence + audit log, `print-cleanup-commands`. | medium (~4 weeks) | codex/sec-scan-command merged | |
| 45 | +| [`sec-remediate`](../../wishes/sec-remediate/WISH.md) | `genie sec remediate` (dry-run, plan manifest, typed consent, quarantine-by-move, resume), `genie sec restore` (per-action), `genie sec rollback` (bulk via audit log), offline credential-rotation guidance, quarantine disk-space limits and GC. | medium (~2 weeks) | sec-scan-progress (for versioned envelope + events schema) | |
| 46 | +| [`genie-supply-chain-signing`](../../wishes/genie-supply-chain-signing/WISH.md) | Cosign-signed release tarballs + SLSA Level 3 provenance, public-key pinning in three channels, `genie sec verify-install` subcommand, `--unsafe-unverified <INCIDENT_ID>` exact contract. | medium (~2 weeks) | none (independent release-engineering) — runs parallel with sec-remediate | |
| 47 | +| [`sec-incident-runbook`](../../wishes/sec-incident-runbook/WISH.md) | `SECURITY.md` invariants section, `docs/incident-response/canisterworm.md` LIKELY COMPROMISED / LIKELY AFFECTED / OBSERVED ONLY decision tree with exact commands, help-text examples, automated cold-runbook test. | small (~1 week) | sec-remediate + genie-supply-chain-signing (consumes both command surfaces) | |
| 48 | + |
| 49 | +**Total wall-time with parallelism:** ~6 weeks (down from the monolith's 6–8 weeks sequential). |
| 50 | + |
| 51 | +## Approach |
| 52 | + |
| 53 | +### Shared invariants (every wish must honor) |
| 54 | + |
| 55 | +1. **Detect-only scanner.** `scripts/sec-scan.cjs` never mutates state. Any mutating verb is a separate subcommand on a separately-signed channel. |
| 56 | +2. **Quarantine is always move + sidecar manifest, never delete.** Every mutating action is reversible via `genie sec restore` (per-action) or `genie sec rollback` (bulk). |
| 57 | +3. **Append-only audit log.** `$GENIE_HOME/sec-scan/audit/<scan_id>.jsonl`, mode `0600`, `fsync`-per-event, shared between scanner telemetry and remediate actions. |
| 58 | +4. **Dry-run is the default.** `--apply` requires a frozen plan manifest produced by a prior `--dry-run` (closes TOCTOU). |
| 59 | +5. **Typed confirmation strings are exact.** Keystroke prompts are prohibited. Scanner uses `CONFIRM-QUARANTINE-<6-hex-of-action-id>`. Signing override uses `--unsafe-unverified <INCIDENT_ID>` where `INCIDENT_ID` matches a documented schema. |
| 60 | +6. **Signature-verified binary for `--apply`.** `sec remediate --apply` refuses on unverified binary unless `--unsafe-unverified <INCIDENT_ID>` is passed and logged. |
| 61 | +7. **Coverage gaps stop remediation.** Capped/skipped roots banner at TOP of scan report; remediate refuses unless `--remediate-partial` + typed ack. |
| 62 | +8. **Versioned envelope.** `reportVersion: 1`, `scan_id` (ULID), `hostId`, `scannerVersion`, timestamps, `invocation`, `platform`. Every sibling wish keys off `scan_id`. |
| 63 | + |
| 64 | +### Wish-split rationale |
| 65 | + |
| 66 | +| Rationale | Monolith risk | Split mitigation | |
| 67 | +|-----------|---------------|------------------| |
| 68 | +| Single-PR narrative preserves atomicity | 6–8 week branch; reviewers can't form an opinion on 2k+ LOC diff | Each wish is a reviewable unit; scan-progress ships first and unblocks remediate | |
| 69 | +| Signing and remediate coupled by `--apply` refusal | Workers on Group 6 hit CI failure before Group 7a's signing pipeline exists | Signing runs parallel to remediate; both land; runbook closes loop | |
| 70 | +| Runbook content depends on command surfaces | Runbook drifts during 6-week build | Runbook wish starts last, consumes frozen surfaces | |
| 71 | +| Council-validated content preserved | — | Every IN bullet from the monolith maps to exactly one sibling wish | |
| 72 | + |
| 73 | +### Reviewer additions absorbed by the split |
| 74 | + |
| 75 | +- **G1** (bulk rollback) → sec-remediate Group 2 |
| 76 | +- **G2** (offline credential rotation) → sec-remediate Group 1 |
| 77 | +- **G3** (quarantine disk-space) → sec-remediate Group 2 |
| 78 | +- **G4** (`0600` mode theater on shared FS) → sec-scan-progress Group 4 + sec-remediate Group 2 |
| 79 | +- **M2** (`--unsafe-unverified <INCIDENT_ID>` contract) → genie-supply-chain-signing Group 2 |
| 80 | +- **M3** (cold-runbook automation) → sec-incident-runbook Group 2 as automated test (`scripts/test-runbook.sh` replays commands in a sandboxed fixture) |
| 81 | +- **M4** (biome on markdown) → sec-incident-runbook Group 2 uses `markdownlint-cli2` instead |
| 82 | +- **H1** (ordering inversion) → resolved by split — signing runs parallel to remediate |
| 83 | +- **H2** (bundling) → resolved by split |
| 84 | +- **H3** (base branch drift) → resolved by Preconditions above |
| 85 | + |
| 86 | +### Alternatives considered and rejected |
| 87 | + |
| 88 | +- **Single monolithic wish with H1/M2/G1–G4 fixed in place.** Preserves single-PR narrative but accepts 6–8 week branch + rebase risk + reviewer fatigue on 2k+ LOC diffs. Lower schedule predictability than the split. |
| 89 | +- **Ship minimal scan-progress now; queue remediate/signing/runbook as future work.** Rejected by user: "we need all of it." Council and reviewer both flagged that queueing remediation for later means operators caught in the next compromise wave have detection without a fix path. |
| 90 | +- **Keep remediation out permanently; delegate to platform (EDR/MDM/IAM).** Questioner's position. Rejected because genie's userbase is predominantly developers on laptops without EDR/MDM coverage, and the org that knows what to remediate is the genie team itself. |
| 91 | + |
| 92 | +## Decisions |
| 93 | + |
| 94 | +| Decision | Rationale | |
| 95 | +|----------|-----------| |
| 96 | +| Split the monolith into 4 sibling wishes under this umbrella | Independent review, independent QA, parallelism between signing and remediate, no long-lived branch | |
| 97 | +| Hard-block all siblings on `codex/sec-scan-command` merging to `main` | Removes H3 entirely; cheap win; single upstream pivot | |
| 98 | +| Scanner and remediate ship as sibling CJS payloads (`sec-scan.cjs` + `sec-remediate.cjs`) | Different blast radii, different review bars, different signing posture; preserves zero-config install for scanner | |
| 99 | +| Signing runs parallel to remediate, not ahead of it | `sec remediate --apply` ships with `--unsafe-unverified` as the interim mode; when signing lands, verification becomes default; no gating on signing timeline | |
| 100 | +| COUNCIL.md remains under `brainstorms/sec-scan-progress/` | Historical artifact — captures the deliberation that produced the split; every sibling wish references it | |
| 101 | +| `sec-scan-progress` slug stays on the scanner wish | Back-compat with existing task board + council output; sibling wishes take new slugs | |
| 102 | + |
| 103 | +## Risks & Mitigations |
| 104 | + |
| 105 | +| Risk | Severity | Mitigation | |
| 106 | +|------|----------|------------| |
| 107 | +| Sibling wishes drift apart as they ship separately | Medium | Umbrella DESIGN.md is the shared source of truth for invariants; each wish references it in its Preconditions | |
| 108 | +| Signing wish delayed; remediate ships without verification | Medium | `--unsafe-unverified` is the interim default; audit log records the flag so a post-hoc signing pass can retroactively mark runs as verified or not | |
| 109 | +| Runbook written before command surfaces stabilize | Medium | sec-incident-runbook explicitly depends on sec-remediate + genie-supply-chain-signing; linter in runbook wish checks that every command referenced in `canisterworm.md` is a real subcommand | |
| 110 | +| `codex/sec-scan-command` never merges | Low | Each wish has a fallback: branch from `codex/sec-scan-command` with daily main-sync; escalation to Felipe for merge decision | |
| 111 | +| User regrets the split mid-execution and asks to re-merge | Low | Sibling wishes can be re-umbrella'd into a release train PR at merge time if desired; wish structure is independent of git branch structure | |
| 112 | + |
| 113 | +## Success Criteria (umbrella-level) |
| 114 | + |
| 115 | +- [ ] All 4 sibling wishes exist with structurally-clean WISHes (`genie wish lint` passes on each). |
| 116 | +- [ ] Every IN bullet from the monolith appears in exactly one sibling wish (no gaps, no duplicates). |
| 117 | +- [ ] Reviewer's G1–G4 + M2 + M3 + M4 gaps are addressed in the sibling wish that owns them. |
| 118 | +- [ ] Dependency graph across siblings is acyclic: scan-progress → remediate (+ signing in parallel) → runbook. |
| 119 | +- [ ] COUNCIL.md is referenced from each sibling wish's Preconditions. |
| 120 | +- [ ] User approves the split before any sibling wish dispatches to `/work`. |
| 121 | + |
| 122 | +## WRS |
| 123 | + |
| 124 | +██████████ 100/100 |
| 125 | + |
| 126 | +Problem ✅ | Scope ✅ | Decisions ✅ | Risks ✅ | Criteria ✅ | Preconditions ✅ | Council-validated ✅ |
0 commit comments