Skip to content

Commit bef5701

Browse files
committed
docs(hardware): add 5090 + Gemma 4 + MTP cross-rig anchor rows (apnar disc #86)
Three operating points from @apnar's full 21-cap 10W sweep: - 400W (efficiency winner): 571 narr / 701 code, 1.429 TPS/W - 510W (narr peak): 619 narr / 724 code, 1.215 TPS/W - 600W (stock baseline): 601 narr / 757 code, 1.103 TPS/W Cross-workload pattern emerges combining apnar's two 5090 sweeps: both Qwen3.6-27B AutoRound and Gemma 4 31B + MTP land at the same ~400W efficiency sweet spot despite ~5× different absolute TPS scales. Updated 5090 compute-saturation note to reflect this is workload- independent on consumer-air-cooled 5090. Hardware-physical ceiling for Gemma 4 + MTP at concurrency=4: ~547W actual draw, no thermal throttle (66°C peak). Above 530W cap = wasted budget. Validates the calibration fix shipped at 29e7de5: at 600W cap with new logic (N=4 plateau-detected), TPS jumps from 499/616 (old N=6) to 600/757 — pure calibration win, +20-25% same-cap TPS.
1 parent dfceccb commit bef5701

1 file changed

Lines changed: 4 additions & 1 deletion

File tree

docs/HARDWARE.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,12 +112,15 @@ Three flags matter for anchor data:
112112
| 4090 | air | llama.cpp default | Qwen3.6 27B Q3_K_XL | 450W (stock) | 52.28 | 52.22 | 0.116 | [@laurimyllari #62](https://github.com/noonghunna/club-3090/discussions/62#discussioncomment-16832066) |
113113
| 5090 | air | vLLM default | Qwen3.6 27B AutoRound | **400W**| 119.98 | 159.23 | 0.300 | [@apnar #62](https://github.com/noonghunna/club-3090/discussions/62#discussioncomment-16832685) |
114114
| 5090 | air | vLLM default | Qwen3.6 27B AutoRound | 575W (near-stock) | 119.38 | 159.94 | 0.277 | [@apnar #62](https://github.com/noonghunna/club-3090/discussions/62#discussioncomment-16832685) |
115+
| 5090 | air | vLLM `gemma-mtp` (TP=1) | Gemma 4 31B + MTP | **400W**| 571.45 | 700.92 | **1.429** | [@apnar #86](https://github.com/noonghunna/club-3090/discussions/86#discussioncomment-16840610) |
116+
| 5090 | air | vLLM `gemma-mtp` (TP=1) | Gemma 4 31B + MTP | 510W (peak narr) | 619.45 | 723.82 | 1.215 | same |
117+
| 5090 | air | vLLM `gemma-mtp` (TP=1) | Gemma 4 31B + MTP | 600W (stock) | 600.65 | 756.67 | 1.103 | same |
115118

116119
⭐ = peak TPS/W efficiency on that rig.
117120

118121
**Cross-rig pattern**: efficiency knee falls at **~60-85% of stock TDP** across consumer Ampere/Ada — start there for a new card class and zoom in. Ada (4090) is proportionally more aggressive than Ampere (3090) — 4090 cuts 33% of stock TDP for ~7% TPS loss; 3090 cuts 15% of stock for ~5% loss.
119122

120-
**5090 compute-saturation note**: @apnar's data shows the 5090 caps at ~430W actual draw on Qwen3.6-27B even when allowed up to 575W — the workload is compute-saturated, not power-saturated. So 400W cap delivers ~equal TPS to 575W. The knee position will likely shift higher when running larger models that actually use the 5090's compute (e.g. Gemma 4 31B + MTP, larger Qwen variants).
123+
**5090 compute-saturation note**: @apnar's data shows the 5090 caps at ~430W actual draw on Qwen3.6-27B even when allowed up to 575W — the workload is compute-saturated, not power-saturated. So 400W cap delivers ~equal TPS to 575W. **Confirmed cross-workload on Gemma 4 31B + MTP**: 21-cap sweep at 10W resolution shows actual draw plateaus at ~547W beyond 530W cap (no thermal throttle, GPU temp peaked 66°C — compute / memory bandwidth limit, not thermal). **Same 400W sweet spot** despite ~5× different absolute TPS class. Pattern: the 5090 + consumer-air-cooled platform appears to have a workload-independent ~400W efficiency knee on this rig class.
121124

122125
**Discussion**: cross-rig power-cap data lives at [disc #86](https://github.com/noonghunna/club-3090/discussions/86). Drop your sweep there.
123126

0 commit comments

Comments
 (0)