Skip to content

Commit 434c8ce

Browse files
author
Sandermage
committed
v7.65: Cliff 8 hardening — partial_apply_warnings counter + PN19 H100-only flag
Cliff 8 hardening (apply_all.py): - New PatchStats.partial_apply_warnings property surfaces skipped patches whose reason indicates real anchor drift / ambiguous-anchor / required- anchor-missing — distinct from benign skips (opt-in OFF, upstream-merged, platform mismatch, deferred, redundant). - Boot summary line now appends "N ⚠️ partial-apply warning(s)" when the count is non-zero, plus per-warning WARNING-level lines that name each patch + reason. Silent anchor-drift skip class noonghunna flagged in club-3090 discussion #19 is now impossible to miss in the boot output. - Validated on live 35B DFlash 160K boot: 0 false positives after BENIGN list refinement (catches "opt-in:", "redundant:", "deferred", "upstream may have absorbed", "config: neutral" etc). CLIFFS.md PN19 H100-only flag (Cliff 1 mech A section): - noonghunna 2026-05-01 confirmed PN19 costs ~120 MiB KV pool on a 24 GB single-3090 (vs the documented 200-500 MiB win on H100). At 218K + 0.985 mem-util, engine init fails with KV cache available 3.4 GiB / required 3.52 GiB. - Documented as: disable PN19 on 24 GB consumer cards (3090, 4090, A5000) running long context. Same lesson as P104 L2 persistence (regressed -16.2% on 32+ layer KV >> L2 setups). Generic allocator hints don't survive GPU class boundaries. No regressions on live 35B DFlash 160K bench: - 44 applied / 42 skipped / 0 failed / 0 partial-apply warnings - prose 256t mean TPS 125.07, CV 3.07% - tool-call 7/7 then 5/7 then 6/7 (variance noise band, no real regression)
1 parent 2b239a8 commit 434c8ce

2 files changed

Lines changed: 85 additions & 1 deletion

File tree

docs/CLIFFS.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,16 @@ You hit OOM earlier than you should on long-context workloads. On a 24 GB card r
2222

2323
**PN17 — FA2 lse runtime clamp.** Genesis-original, 2026-04-30, in response to noonghunna Issue #11. Patches FA2 to use the actual `seq_lens.max()` at runtime instead of `max_model_len` during capture.
2424

25+
**PN19 — scoped max-split cudagraph init (datacenter Ampere / Hopper / Blackwell only).** Genesis-original, 2026-04-30. Frees 200-500 MiB during model load on H100/B100. **Does NOT transfer cleanly to Ampere consumer:** noonghunna 2026-05-01 confirmed PN19 costs ~120 MiB KV pool on a 24 GB single-3090 (vs the documented 200-500 MiB win). At 218K context + 0.985 mem-util, engine init fails with `KV cache memory available 3.4 GiB, estimated maximum model length is 206400`. Different allocator behavior under PyTorch 2.10+ load-time fragmentation on consumer SKUs.
26+
27+
> **Recommendation:** disable PN19 on 24 GB consumer cards (3090, 4090, A5000) running long context. Same lesson as P104 L2 persistence (regressed -16.2% on 32+ layer KV >> L2 setups). Generic allocator hints don't survive GPU class boundaries.
28+
2529
**Refs**
2630

2731
- `vllm/_genesis/wiring/perf_hotfix/patch_n17_fa2_softmax_lse_clamp.py`
32+
- `vllm/_genesis/wiring/perf_hotfix/patch_N19_scoped_max_split.py`
2833
- noonghunna Issue #11 (cross-engine derivative)
34+
- club-3090 Discussion #19 (PN19 ≠ H100 ergonomics report, 2026-05-01)
2935

3036
---
3137

vllm/_genesis/patches/apply_all.py

Lines changed: 79 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,23 +87,85 @@ def skipped_count(self) -> int:
8787
def failed_count(self) -> int:
8888
return len(self.failed)
8989

90+
@property
91+
def partial_apply_warnings(self) -> list[PatchResult]:
92+
"""Skipped patches whose reason signals a real problem (drift,
93+
ambiguous anchor, anchor-missing — NOT opt-in-OFF, upstream-merged,
94+
or platform-mismatch which are all expected).
95+
96+
Surfaced separately from `skipped_count` so noonghunna's "silent
97+
skip class" diagnosis (club-3090 discussion #19) is impossible to
98+
miss in the boot summary. Cliff 8 hardening, v7.65.
99+
"""
100+
# Reasons that indicate a benign/expected skip
101+
BENIGN = (
102+
"opt-in", # matches "opt-in only", "opt-in:", "opt-in env"
103+
"default off",
104+
"upstream_merged",
105+
"upstream_already",
106+
"upstream_already_contains",
107+
"upstream may have absorbed",
108+
"upstream pr", # "redundant: upstream PR ..."
109+
"platform mismatch",
110+
"platform_skip",
111+
"config: opt-in",
112+
"config: opt-out",
113+
"config: skipped",
114+
"config: neutral",
115+
"already applied",
116+
"marker present",
117+
"soft_skip",
118+
"no-op",
119+
"dry-run",
120+
"vllm install root not discoverable",
121+
"target file not resolvable",
122+
"is_pn",
123+
"unsupported",
124+
"not applicable",
125+
"auto-disabled",
126+
"auto-skip",
127+
"deprecated",
128+
"obsolete",
129+
"redundant",
130+
"deferred",
131+
"incompatible with", # P7 deferred reason
132+
)
133+
warnings = []
134+
for r in self.skipped:
135+
reason_lower = (r.reason or "").lower()
136+
if not any(b.lower() in reason_lower for b in BENIGN):
137+
warnings.append(r)
138+
return warnings
139+
140+
@property
141+
def partial_apply_warnings_count(self) -> int:
142+
return len(self.partial_apply_warnings)
143+
90144
def summary(self) -> dict[str, Any]:
91145
return {
92146
"applied": self.applied_count,
93147
"skipped": self.skipped_count,
94148
"failed": self.failed_count,
149+
"partial_apply_warnings": self.partial_apply_warnings_count,
95150
"details": {
96151
"applied": [(r.name, r.reason) for r in self.applied],
97152
"skipped": [(r.name, r.reason) for r in self.skipped],
98153
"failed": [(r.name, r.reason) for r in self.failed],
154+
"partial_apply_warnings": [
155+
(r.name, r.reason) for r in self.partial_apply_warnings
156+
],
99157
},
100158
}
101159

102160
def __str__(self) -> str:
103-
return (
161+
base = (
104162
f"Results: {self.applied_count} applied, "
105163
f"{self.skipped_count} skipped, {self.failed_count} failed"
106164
)
165+
warns = self.partial_apply_warnings_count
166+
if warns:
167+
base += f", {warns} ⚠️ partial-apply warning(s)"
168+
return base
107169

108170

109171
# ═══════════════════════════════════════════════════════════════════════════
@@ -3552,6 +3614,22 @@ def run(verbose: bool = True, apply: bool = False) -> PatchStats:
35523614

35533615
log.info("Genesis %s", stats)
35543616

3617+
# [Genesis v7.65 / Cliff 8 hardening] Surface partial-apply warnings.
3618+
# Silent anchor-drift / ambiguous-anchor / anchor-missing skips were
3619+
# the class noonghunna flagged in club-3090 discussion #19. Drift
3620+
# detection works correctly, but the user-visible summary previously
3621+
# buried the signal in the same `skipped` count as opt-in OFF. Now
3622+
# warnings are pulled out and logged individually at WARNING level.
3623+
if stats.partial_apply_warnings:
3624+
log.warning(
3625+
"[Genesis] %d partial-apply warning(s) — patch(es) failed to "
3626+
"match expected source pattern. Review below to confirm anchor "
3627+
"drift vs upstream change vs config issue:",
3628+
stats.partial_apply_warnings_count,
3629+
)
3630+
for r in stats.partial_apply_warnings:
3631+
log.warning("[Genesis] ⚠️ %s — %s", r.name, r.reason)
3632+
35553633
# [Genesis v7.13] Emit Dispatcher v2 apply matrix as a single readable
35563634
# block. Only matters for patches that route through dispatcher.should_apply
35573635
# (P56-P62 currently); other patches get only the per-line INFO above.

0 commit comments

Comments
 (0)