Skip to content

Epic: drain the bug queue by repairing domain boundaries #992

@robertDouglass

Description

@robertDouglass

Purpose

Drain the Spec Kitty bug queue by repairing the domain boundaries that keep producing symptom fixes.

This epic is based on a holistic survey of:

The observed pattern is not random breakage. The queue clusters around a small number of domain seams where several command surfaces independently infer or mutate the same truth. That creates whack-a-mole behavior: one command is patched, then the invariant fails at the next boundary.

The goal is to make each domain invariant executable once, then route every CLI surface, JSON surface, dry-run, real execution path, dashboard reader, review gate, and sync side effect through it.

Core Architectural Diagnosis

Spec Kitty currently has too many places acting as if they own workflow truth:

  • spec-kitty next
  • spec-kitty agent action implement/review
  • spec-kitty agent tasks move-task
  • spec-kitty agent tasks status
  • spec-kitty review
  • spec-kitty merge --dry-run
  • real spec-kitty merge
  • dashboard/status materializers
  • SaaS sync/final-sync fan-out
  • release and mission-review gates

The alien-intelligence reading: these are not separate problems. They are projections of a few aggregates whose invariants are not yet centralized.

North Star Invariants

  • A Work Package has exactly one lifecycle authority.
  • A Review Cycle has exactly one artifact/verdict/pointer/override authority.
  • Merge dry-run and real merge evaluate the same readiness object.
  • Local state transitions and SaaS publication are separate outcomes with explicit failure classification.
  • TeamSpace ingress accepts only canonical envelopes, and repair/import tooling proves that before live projection.
  • Release review cannot silently skip a required gate for a newly created mission.
  • Compatibility cleanup cannot use "green enough" test output to hide contract drift.

Open PR Intake: TeamSpace Cutover Set

These four PRs are open, non-draft, merge-clean, and green as of 2026-05-05. They should be treated as the current implementation front of the TeamSpace migration boundary, not duplicated by this epic.

  • Add TeamSpace mission-state repair and dry-run #980: Add TeamSpace mission-state repair and dry-run. Head 01db3e0b. Adds doctor mission-state --fix, --teamspace-dry-run, canonical TeamSpace envelope synthesis, CLI sync expectation alignment, and 3.2.0rc2.
  • Priivacy-ai/spec-kitty-saas#150: Enforce TeamSpace ingress event contracts. Head 342e77a5. Enforces canonical ingress, rejects historical/raw/legacy keys recursively, fixes 5.0.0 lane handoff, and hardens websocket test cleanup.
  • Classify runtime logs for TeamSpace migration spec-kitty-runtime#19: Classify runtime logs for TeamSpace migration. Head ba167541. Establishes runtime logs as local side logs, not TeamSpace status authority or direct import payloads.
  • Priivacy-ai/spec-kitty-tracker#14: Guard tracker TeamSpace mission payloads. Head 7fdde89. Ensures tracker egress stays on canonical mission_id and does not become rollout/import authority.

Open PR Follow-Up Queue

Integration Rule

Land these PRs as one coordinated cutover set, then run a post-merge import-readiness pass against the same repositories. The queue-draining work below should consume their new boundaries instead of creating parallel validators.

Workstream 0: TeamSpace Canonical Import Boundary

Domain Problem

TeamSpace launch introduces a hard boundary between historical local mission state and canonical SaaS projection. The open PR set repairs and guards that boundary, but the queue drain must treat the boundary as a product invariant after merge.

Issues And PRs Covered

Target Shape

The launch boundary has one explicit contract:

  • CLI can repair and produce canonical envelopes.
  • SaaS accepts only canonical envelopes.
  • runtime logs are classified as local side logs, never status authority.
  • tracker emits canonical mission payloads and does not own rollout/import semantics.
  • spec-kitty-events==5.0.0 is the published shared contract.

Acceptance

  • All four open PRs land against current main without semantic reroll.
  • spec-kitty-events==5.0.0 is published and downstream Git source overrides are removed.
  • doctor mission-state --audit --json, --fix, and --teamspace-dry-run --json are run on the selected active repositories.
  • Generated repair manifests are reviewed before import/projection.
  • SaaS ingress rejects raw historical rows, recursive forbidden legacy keys, missing build identity, and non-canonical envelopes.
  • Runtime/tracker tests prove they are not TeamSpace status/import authorities.

Workstream 1: WorkPackageLifecycle Authority

Domain Problem

next, explicit agent action, move-task, dashboard/status, and merge currently infer claimability, readiness, and terminal state from overlapping but non-identical rules.

Open Issues Covered

Latent Regression Debt From Recently Closed Bugs

Target Shape

Introduce or harden a single WorkPackageLifecycle domain service that answers:

  • What is the current state?
  • What transitions are allowed?
  • What actor may claim/review/approve?
  • What is claimable next?
  • What evidence is required?
  • Which command surfaces may present or serialize this decision?

Acceptance

  • next --json, agent action implement, agent action review, move-task, status board, and dashboard agree on claimable and blocked WPs for the same fixture.
  • A multi-lane fixture with a later ready WP proves bug: spec-kitty next --json can miss claimable WPs available to explicit agent action #988 dead.
  • Planning-artifact and lane-backed WPs use the same lifecycle decision API.
  • No command writes WP lifecycle state except through the canonical transition pipeline.

Workstream 2: ReviewCycle As A Real Aggregate

Domain Problem

Recent fixes created a narrow review-cycle boundary, but current open issues show the boundary is still too file-shaped. Review artifact creation, verdicts, pointers, overrides, status transitions, dry-run gates, real merge gates, and fix-mode prompt loading must all share one domain model.

Open Issues Covered

Latent Regression Debt From Recently Closed Bugs

Target Shape

Promote the review boundary from "rejected artifact helper" to ReviewCycle / ReviewDecision aggregate:

  • artifact write is generated from structured data only;
  • artifact parser validates identity, cycle number, verdict, body, and provenance;
  • latest verdict and status lane consistency are evaluated once;
  • overrides are explicit domain events, not textual patch-ups;
  • all references are canonical review-cycle://;
  • approved/rejected/override states are all first-class, not special cases.

Acceptance

  • A new review cycle cannot embed or wrap a prior cycle body/frontmatter.
  • move-task, fix-mode prompt loading, review, merge --dry-run, and real merge use the same ReviewCycleConsistency evaluator.
  • merge --dry-run --json reports REJECTED_REVIEW_ARTIFACT_CONFLICT whenever real merge would.
  • Legacy feedback:// remains readable only as a migration compatibility path and never persists on new transitions.

Workstream 3: MergeReadiness Parity

Domain Problem

Dry-run and real merge have repeatedly diverged. Fixing one missing preflight at a time is a symptom pattern.

Open Issues Covered

Latent Regression Debt From Recently Closed Bugs

Target Shape

Create a single MergeReadiness.evaluate() decision object used by both dry-run and real merge.

The decision should include:

  • missing mission branch;
  • stale target branch;
  • sparse checkout guard;
  • review artifact conflicts;
  • lane/worktree state;
  • status.events consistency;
  • post-merge done-transition plan;
  • lock/abort recovery status;
  • JSON-stable diagnostic codes and remediation.

Acceptance

  • Dry-run and real merge share one evaluator and one test fixture matrix.
  • Every blocker real merge can raise is asserted in dry-run JSON.
  • Real merge performs no irreversible git mutation before readiness passes.
  • Merge abort/lock recovery is covered as part of the same readiness/recovery model.

Workstream 4: SyncPublication And Auth/Teamspace Classification

Domain Problem

Local command success and SaaS sync publication are separate outcomes, but bugs show they are sometimes conflated, hidden, or misclassified.

Open Issues Covered

Latent Regression Debt From Recently Closed Bugs

Target Shape

Create a SyncPublication outcome model:

  • local mutation result;
  • publication attempted/not attempted;
  • retryable vs non-retryable classification;
  • auth/teamspace failure reason;
  • queue mutation policy;
  • user remediation;
  • JSON/stderr separation.

Acceptance

  • 401/403 teamspace/auth failures are never server_error.
  • Batch-level auth/teamspace failures do not increment per-event retry counts.
  • sync doctor detects missing Private Teamspace before sync now drains.
  • Successful local state transitions keep stdout/JSON clean while exposing non-fatal sync publication failures on stderr/log channels.
  • SaaS-enabled smoke uses SPEC_KITTY_ENABLE_SAAS_SYNC=1 on this machine.

Workstream 5: ReleaseEvidence And Review Gates

Domain Problem

Release and mission-review gates sometimes report pass/skip rather than enforcing a domain invariant. That hides missing evidence.

Open Issues Covered

Latent Regression Debt From Recently Closed Bugs

Target Shape

Create a ReleaseEvidence contract:

  • new missions must have baseline_merge_commit or fail with repair guidance;
  • legacy compatibility skips must be explicitly scoped and machine-readable;
  • type-check gate is either green or explicitly narrowed with a documented rationale;
  • CI-quality and release-readiness have non-overlapping responsibilities;
  • E2E environment exceptions are preflight-classified before product assertions.

Acceptance

  • spec-kitty review fails or repairs when a new mission lacks baseline_merge_commit.
  • Legacy missions get a distinct diagnostic code, not a generic pass-with-warning.
  • Strict mypy release gate is either green or intentionally scoped with an issue-backed contract.
  • CI release-readiness consumes CI-quality outputs instead of duplicating broad test execution.
  • Cross-repo E2E harness preflights runner compatibility before product assertions.

Workstream 6: Input/Upgrade/Encoding Boundary Hygiene

Domain Problem

Some older queue items are not part of the current release firefight but point to the same pattern: implicit file/content assumptions made at many call sites.

Open Issues Covered

Latent Regression Debt From Recently Closed Bugs

Target Shape

Make input and persisted artifact boundaries explicit:

  • all external text ingestion passes through encoding/provenance policy;
  • all path expansion passes through containment policy;
  • all generated command/skill files pass through host-specific schema validators;
  • all worktree-vs-main repo resolution uses git topology, not path naming;
  • all upgrade destructive actions require preserved customization policy.

Acceptance

  • One lifecycle chokepoint and regression fixture for encoding provenance.
  • Intake, upgrade, and generated-artifact paths use shared containment/atomic-write helpers.
  • Worktree/main resolution has a contract fixture covering managed and external worktrees.

Queue Drain Order

Phase -1: Land The Green TeamSpace PR Set

Phase 0: Safety Rail

  • Add a cross-surface fixture harness that can run the same mission state through next, agent action, move-task, status, dashboard scanner, review, merge --dry-run, and real merge preflight.
  • Establish diagnostic-code inventory for lifecycle, review, merge, sync, and release evidence blockers.

Phase 1: Stop Active Release Bleeding

Phase 2: Collapse Symptom Families

  • Promote review-cycle aggregate beyond rejected-only helper.
  • Promote merge readiness to a shared dry-run/real evaluator.
  • Promote work-package lifecycle decisioning to a single domain service consumed by all command surfaces.
  • Promote sync publication outcomes to explicit local-result plus remote-publication-result objects.

Phase 3: Drain Older Boundary Debt

Definition Of Done

  • All current open bug issues listed above are closed or explicitly moved to a successor epic with a current repro and deferral rationale.
  • Each recently fixed bug family has at least one cross-boundary regression test that proves the invariant at the next boundary, not only at the original symptom boundary.
  • The TeamSpace cutover PR set is either landed and post-merge verified, or explicitly split with launch-blocking rationale.
  • No release-blocking command reports success while silently skipping a required new-mission gate.
  • Dry-run output is a faithful serialization of the same domain decision real execution uses.
  • SaaS sync publication failures cannot be mistaken for local mutation failures, and local mutation success cannot hide a non-retryable publication blocker.
  • The final smoke includes local mode and SaaS-enabled mode with SPEC_KITTY_ENABLE_SAAS_SYNC=1.

Non-Goals

  • This epic should not become a product expansion vehicle.
  • Do not rewrite the CLI surface wholesale.
  • Do not close old issues merely because this epic references them.
  • Do not add broad suppressions, skips, or compatibility fallbacks unless they are typed, diagnosed, and issue-backed.

Suggested Labels

bug, workflow, release, epic

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingepicTracks a group of related issuesreleaseRelease tracking and coordinationworkflowWorkflow/UX improvements

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions