Skip to content

Commit eebe7b2

Browse files
committed
Record: SP8192 + Pre-Quant TTT + QK-Gain 5.0 — val_bpb 1.0791 (3-seed mean)
SP8192 + Pre-Quant AdamW TTT + QK-Gain 5.0 on PR openai#1394 base. 3-seed mean: 1.0791 BPB. Track A, no eval-time adaptation.
1 parent 9d070df commit eebe7b2

6 files changed

Lines changed: 2752 additions & 0 deletions

File tree

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Record: SP8192 + Pre-Quant TTT + QK-Gain 5.0 — val_bpb 1.0791 (3-seed mean)
2+
3+
**val_bpb = 1.0791** (3-seed mean, std 0.0012) | **~15.12 MB** | 8xH100 SXM
4+
5+
## 3-Seed Results (8xH100 80GB SXM, PyTorch 2.9.1+cu128)
6+
7+
| Seed | **Sliding BPB** | Artifact |
8+
|------|-----------------|----------|
9+
| 42 | **1.0802** | 15,123,918 |
10+
| 314 | **1.0778** | 15,118,254 |
11+
| 999 | **1.0794** | 15,127,567 |
12+
| **Mean** | **1.0791** | |
13+
14+
Merged SOTA (PR #1019): **1.1147 BPB**. Delta: **-0.0356 BPB**.
15+
16+
## Key Change: QK-Gain 5.0 on the SP8192 + Pre-Quant TTT stack
17+
18+
Takes PR #1394 (@clarkkev) + PR #1364 pre-quant TTT and adds QK-Gain 5.0 (from 4.0). The base stack: SP8192, MLP 4x, depth recurrence (loop 4,5), MuonEq-R, SDClip, GPTQ embeddings, sigmoid-gated U-Net skips, brotli.
19+
20+
## Compliance (Track A — Fixed Predictor)
21+
22+
- No eval-time adaptation — model frozen after training + pre-quant TTT + GPTQ
23+
- No SLOT, no n-gram cache
24+
- Pre-quant TTT adapts EMA weights BEFORE GPTQ quantization (baked into artifact)
25+
- Standard sliding-window eval (stride=64)
26+
- All four conditions from Issue #1017 satisfied
27+
28+
## Reproduction
29+
30+
```bash
31+
pip install brotli
32+
MATCHED_FINEWEB_REPO_ID=kevclark/parameter-golf python3 data/cached_challenge_fineweb.py --variant sp8192 --skip-manifest
33+
SEED=42 QK_GAIN_INIT=5.0 torchrun --standalone --nproc_per_node=8 train_gpt.py
34+
```
35+
36+
## Credits
37+
38+
PR #1394 @clarkkev, PR #1364 @stukenov, PR #1416 @erichroepke, PR #1217 @bigbag, PR #1204 @msisovic, PR #1260 @dexhunter, PR #1019 @abaybektursun
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"author": "aryanbhosale",
3+
"github_id": "aryanbhosale",
4+
"name": "SP8192 + Pre-Quant TTT + QK-Gain 5.0 + Depth Recurrence + MuonEq-R + SDClip",
5+
"date": "2026-04-06",
6+
"track": "10min_16mb",
7+
"val_bpb": 1.07912726,
8+
"val_bpb_std": 0.00123751,
9+
"seeds": [42, 314, 999],
10+
"seed_results": {
11+
"42": {"val_bpb": 1.08020079, "artifact_bytes": 15123918},
12+
"314": {"val_bpb": 1.07777375, "artifact_bytes": 15118254},
13+
"999": {"val_bpb": 1.07940724, "artifact_bytes": 15127567}
14+
},
15+
"comparison_baseline_pr": 1019,
16+
"delta_vs_pr1019_bpb": -0.03560783,
17+
"hardware": "8xH100 80GB SXM",
18+
"pytorch_version": "2.9.1+cu128",
19+
"technique_summary": "SP8192 + MLP 4x + Pre-Quant AdamW TTT (6 epochs) + QK-Gain 5.0 + Depth Recurrence (loop 4,5) + MuonEq-R + SDClip + GPTQ Embeddings + Brotli"
20+
}

0 commit comments

Comments
 (0)