docs(hardware): add 5090 + Gemma 4 + MTP cross-rig anchor rows (apnar disc #86)

noonghunna · noonghunna · commit bef57012a725 · 2026-05-07T12:37:55.000Z
Three operating points from @apnar's full 21-cap 10W sweep: - 400W (efficiency winner): 571 narr / 701 code, 1.429 TPS/W - 510W (narr peak): 619 narr / 724 code, 1.215 TPS/W - 600W (stock baseline): 601 narr / 757 code, 1.103 TPS/W Cross-workload pattern emerges combining apnar's two 5090 sweeps: both Qwen3.6-27B AutoRound and Gemma 4 31B + MTP land at the same ~400W efficiency sweet spot despite ~5× different absolute TPS scales. Updated 5090 compute-saturation note to reflect this is workload- independent on consumer-air-cooled 5090. Hardware-physical ceiling for Gemma 4 + MTP at concurrency=4: ~547W actual draw, no thermal throttle (66°C peak). Above 530W cap = wasted budget. Validates the calibration fix shipped at 29e7de5: at 600W cap with new logic (N=4 plateau-detected), TPS jumps from 499/616 (old N=6) to 600/757 — pure calibration win, +20-25% same-cap TPS.
diff --git a/docs/HARDWARE.md b/docs/HARDWARE.md
@@ -112,12 +112,15 @@ Three flags matter for anchor data:
 | 4090 | air | llama.cpp default | Qwen3.6 27B Q3_K_XL | 450W (stock) | 52.28 | 52.22 | 0.116 | [@laurimyllari #62](https://github.com/noonghunna/club-3090/discussions/62#discussioncomment-16832066) |
 | 5090 | air | vLLM default | Qwen3.6 27B AutoRound | **400W** ⭐ | 119.98 | 159.23 | 0.300 | [@apnar #62](https://github.com/noonghunna/club-3090/discussions/62#discussioncomment-16832685) |
 | 5090 | air | vLLM default | Qwen3.6 27B AutoRound | 575W (near-stock) | 119.38 | 159.94 | 0.277 | [@apnar #62](https://github.com/noonghunna/club-3090/discussions/62#discussioncomment-16832685) |
+| 5090 | air | vLLM `gemma-mtp` (TP=1) | Gemma 4 31B + MTP | **400W** ⭐ | 571.45 | 700.92 | **1.429** | [@apnar #86](https://github.com/noonghunna/club-3090/discussions/86#discussioncomment-16840610) |
+| 5090 | air | vLLM `gemma-mtp` (TP=1) | Gemma 4 31B + MTP | 510W (peak narr) | 619.45 | 723.82 | 1.215 | same |
+| 5090 | air | vLLM `gemma-mtp` (TP=1) | Gemma 4 31B + MTP | 600W (stock) | 600.65 | 756.67 | 1.103 | same |
 
 ⭐ = peak TPS/W efficiency on that rig.
 
 **Cross-rig pattern**: efficiency knee falls at **~60-85% of stock TDP** across consumer Ampere/Ada — start there for a new card class and zoom in. Ada (4090) is proportionally more aggressive than Ampere (3090) — 4090 cuts 33% of stock TDP for ~7% TPS loss; 3090 cuts 15% of stock for ~5% loss.
 
-**5090 compute-saturation note**: @apnar's data shows the 5090 caps at ~430W actual draw on Qwen3.6-27B even when allowed up to 575W — the workload is compute-saturated, not power-saturated. So 400W cap delivers ~equal TPS to 575W. The knee position will likely shift higher when running larger models that actually use the 5090's compute (e.g. Gemma 4 31B + MTP, larger Qwen variants).
+**5090 compute-saturation note**: @apnar's data shows the 5090 caps at ~430W actual draw on Qwen3.6-27B even when allowed up to 575W — the workload is compute-saturated, not power-saturated. So 400W cap delivers ~equal TPS to 575W. **Confirmed cross-workload on Gemma 4 31B + MTP**: 21-cap sweep at 10W resolution shows actual draw plateaus at ~547W beyond 530W cap (no thermal throttle, GPU temp peaked 66°C — compute / memory bandwidth limit, not thermal). **Same 400W sweet spot** despite ~5× different absolute TPS class. Pattern: the 5090 + consumer-air-cooled platform appears to have a workload-independent ~400W efficiency knee on this rig class.
 
 **Discussion**: cross-rig power-cap data lives at [disc #86](https://github.com/noonghunna/club-3090/discussions/86). Drop your sweep there.