PG Tests baseline flake rate ≈1/3626 per run — surfaced by wish pg-test-perf

## Summary

After landing wish [pg-test-perf](https://github.com/automagik-dev/genie/pull/1317), PG Tests (serial, 208 files, 3626 tests) consistently shows **1-3 flaky failures per run**, with the failing test **different every time**. Across 7 consecutive CI runs on PR #1317:

| Run | Wall-clock | Flakes | Which tests |
|---|---|---|---|
| 1 | 145s (pre-timeout-bump) | 3 | events-stream drain, qa-runner × 2 |
| 2 | 64.9s | 2 | Team Manager createTeam, turn-close happy path |
| 3 | 67.5s | 1 | `(unnamed)` beforeEach timeout |
| 4 | 87.9s | 2 | serve lifecycle, events-stream cursor |
| 5 | 64.9s | 1 | Group 1 observability migrations |
| 6 | 76.1s | 1 | pg > register and get |
| 7 | 68.7s | 1 + 2 errors | pg > syncWishes upserts |

Never the same test twice. That's the classic signature of environmental noise — Blacksmith runner jitter, pgserve cold-start variance, NOTIFY/LISTEN timing, fixture-order dependence. Not a deterministic code bug.

## Impact

- Merge velocity: every PR needs a `gh run rerun --failed` lottery to hit green.
- Signal erosion: engineers learn to ignore "1 fail" as noise. Real regressions hide in the noise.

## Proposed approaches (ranked by ROI)

1. **Retry-failed-tests-once inside `bun test`** (custom reporter or wrapper) — masks intermittents, surfaces consistent fails. The fastest ROI, ugly but pragmatic.
2. **Audit the 7 known flaky tests and fix them one-by-one.** Each is likely a fixture-order or connection-reuse issue. Largest cleanup, but durable.
3. **Bump `bun test --timeout`** to 30s (up from our 15s). Absorbs more Blacksmith jitter. Lowest effort.
4. **Quarantine the known flaky test files** behind `process.env.GENIE_TEST_STRICT=1`. Default CI: skip them. Nightly / on-demand: run them. Gives fast merges without losing long-term signal.

## Not this wish

pg-test-perf shipped the harness improvements that were the actual deliverable:
- Shared pgserve daemon (ram/template cache/lockfile reuse)
- Admin-connection reuse
- Lazy pgserve boot for non-PG tests
- macOS RAM-disk opt-in
- CI job split (unit-tests + pg-tests + umbrella)

The flake inventory is repo-wide baseline that predates the wish — confirmed by observing dev's own CI runs (e.g., run 24794680717) failing on the same kind of intermittents before this branch existed.

Assign to whoever owns the flaky test files. Tag: flaky-test, ci-health.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PG Tests baseline flake rate ≈1/3626 per run — surfaced by wish pg-test-perf #1335

Summary

Impact

Proposed approaches (ranked by ROI)

Not this wish

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Run	Wall-clock	Flakes	Which tests
1	145s (pre-timeout-bump)	3	events-stream drain, qa-runner × 2
2	64.9s	2	Team Manager createTeam, turn-close happy path
3	67.5s	1	`(unnamed)` beforeEach timeout
4	87.9s	2	serve lifecycle, events-stream cursor
5	64.9s	1	Group 1 observability migrations
6	76.1s	1	pg > register and get
7	68.7s	1 + 2 errors	pg > syncWishes upserts

PG Tests baseline flake rate ≈1/3626 per run — surfaced by wish pg-test-perf #1335

Description

Summary

Impact

Proposed approaches (ranked by ROI)

Not this wish

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions