|
1 | | -# Record: SP8192 + Parallel Residuals + 3-Layer Recurrence + Legal N-gram Tilt — val_bpb 1.07813 (5-seed mean) |
| 1 | +# Record: SP8192 + Parallel Residuals + 3-Layer Recurrence + Legal N-gram Tilt — val_bpb 1.07807 (5-seed mean) |
2 | 2 |
|
3 | | -**val_bpb: 1.07813** (5-seed mean, std 0.00046) | **2.78491 nats per token** | **~15.99 MB** | 8×H100 SXM, 600 s | Legal Score-First TTT + Causal N-gram Tilt |
| 3 | +**val_bpb: 1.07807** (5-seed mean, std 0.00040) | **2.78478 nats per token** | **~15.99 MB** | 8×H100 SXM, 600 s | Legal Score-First TTT + Causal N-gram Tilt |
4 | 4 |
|
5 | | -Beats [PR #1394](https://github.com/openai/parameter-golf/pull/1394) (1.08563) by **0.00750 bpb / 0.01938 nats per token** on a 5-seed mean, comfortably clearing the 0.005-nats record threshold. Beats [PR #1420](https://github.com/openai/parameter-golf/pull/1420) (1.08014) by **0.00201 bpb / 0.00520 nats per token**, clearing the 0.005-nats threshold against the next-best legal open PR. Beats our own [PR #1413](https://github.com/openai/parameter-golf/pull/1413) (1.08279) by **0.00466 bpb / 0.01205 nats per token**. |
| 5 | +Beats [PR #1394](https://github.com/openai/parameter-golf/pull/1394) (1.08563) by **0.00756 bpb / 0.01952 nats per token** on a 5-seed mean, comfortably clearing the 0.005-nats record threshold. Beats [PR #1420](https://github.com/openai/parameter-golf/pull/1420) (1.08014) by **0.00207 bpb / 0.00534 nats per token**, clearing the 0.005-nats threshold against the next-best legal open PR. Beats our own [PR #1413](https://github.com/openai/parameter-golf/pull/1413) (1.08279) by **0.00472 bpb / 0.01218 nats per token**. |
6 | 6 |
|
7 | 7 | ## Results (8×H100 80GB SXM, PyTorch 2.9.1+cu128, legal score-first TTT with causal n-gram tilt) |
8 | 8 |
|
9 | | -### Core (TTT) table — 5-seed verification |
| 9 | +### Core (TTT) table — 5-seed verification, all seeds re-run via shipped mini wrapper |
10 | 10 |
|
11 | | -| Seed | Steps | Pre-quant BPB | Sliding BPB | **Post-TTT (n-gram tilted) BPB** | val_loss (nats) | Artifact (mini, bytes) | |
| 11 | +| Seed | Steps | Pre-quant BPB | Sliding BPB | **Post-TTT (n-gram tilted) BPB** | val_loss (nats) | Artifact (bytes) | |
12 | 12 | |---:|---:|---:|---:|---:|---:|---:| |
13 | | -| 0 | 4911 | 1.08717 | 1.08220 | **1.07743** | 2.78312 | ~15,990,971 (proj from raw delta) | |
14 | | -| 42 | 4913 | 1.08781 | 1.08262 | **1.07808** | 2.78479 | **15,993,733 (mini-verified)** | |
15 | | -| 1234 | 4905 | 1.08820 | 1.08352 | **1.07848** | 2.78581 | ~15,988,567 (proj from raw delta) | |
16 | | -| 1337 | 4909 | 1.08772 | 1.08246 | **1.07801** | 2.78461 | **15,988,039 (mini-verified)** | |
17 | | -| 2025 | 4908 | 1.08842 | 1.08306 | **1.07862** | 2.78620 | **15,992,215 (mini-verified)** | |
18 | | -| **5-seed mean** | | **1.08786** | **1.08277** | **1.07813** | **2.78491** | all under 16,000,000 | |
| 13 | +| 0 | 4918 | 1.08728 | 1.08209 | **1.07751** | 2.78333 | **15,992,304** ✅ | |
| 14 | +| 42 | 4911 | 1.08785 | 1.08268 | **1.07809** | 2.78481 | **15,993,733** ✅ | |
| 15 | +| 1234 | 4908 | 1.08794 | 1.08280 | **1.07813** | 2.78492 | **15,990,539** ✅ | |
| 16 | +| 1337 | 4909 | 1.08772 | 1.08246 | **1.07801** | 2.78461 | **15,988,039** ✅ | |
| 17 | +| 2025 | 4908 | 1.08842 | 1.08306 | **1.07862** | 2.78620 | **15,992,215** ✅ | |
| 18 | +| **5-seed mean** | | **1.08784** | **1.08262** | **1.07807** | **2.78478** | all < 16,000,000 | |
19 | 19 |
|
20 | | -**Verification status (5-seed update):** |
21 | | -- All 5 seeds use the same shipped configuration (`pr1394_with_ttt.py` with `PARALLEL_RESIDUAL_START=7 LOOP_START=3 LOOP_END=5 NGRAM_TILT_ENABLED=1 QK_GAIN_INIT=5 TTT_ENABLED=1` defaults). |
22 | | -- **3 of 5 seeds** (s42, s1337, s2025) have been independently re-run via the shipped `train_gpt.py` self-extracting LZMA mini wrapper (~18.9 KB code) and verified to fit under 16,000,000 bytes with the BPB matching within float64 noise (s42 raw 1.07808 vs s42 mini 1.07809). |
23 | | -- **s0 and s1234** were initially scored from the readable source (`pr1394_with_ttt.py`, ~79 KB code) and their mini-wrapper artifact sizes are projected from the verified s42 raw-vs-mini delta (65,913 bytes saved). Both project comfortably under 16,000,000 bytes. Mini-wrapper re-runs of s0 and s1234 are in progress; this PR will be updated when they land if the BPB drift is non-trivial. |
24 | | -- 5-seed standard deviation: **0.00046 BPB** (5-seed standard error of the mean: ~0.00021). |
| 20 | +**Verification status:** |
| 21 | +- **All 5 seeds independently re-run via the shipped `train_gpt.py` self-extracting LZMA mini wrapper** (~18.9 KB code, ~57 KB decoded payload). Each artifact is the actual `Total submission size quantized+brotli` from the mini-wrapper run, NOT a projection. |
| 22 | +- **All 5 artifacts fit under 16,000,000 bytes** with 6,267–11,961 byte headroom. |
| 23 | +- 5-seed standard deviation: **0.00040 BPB** (5-seed standard error of the mean: ~0.00018). |
| 24 | +- BPB values are reported from the legal score-first TTT eval pass with causal n-gram tilt applied; sliding (no-TTT) and pre-quant numbers are also shown for diagnostic transparency. |
25 | 25 |
|
26 | | -### Diagnostics |
| 26 | +### Diagnostics (mini-wrapper runs) |
27 | 27 |
|
28 | 28 | | Seed | Pre-quant BPB | Quantized roundtrip BPB | Sliding BPB | TTT BPB | TTT eval (s) | N-gram precompute (s) | N-gram hint coverage | |
29 | 29 | |---:|---:|---:|---:|---:|---:|---:|---:| |
30 | | -| 0 | 1.08717 | 1.09895 | 1.08220 | 1.07743 | 333.6 | 31.8 | 22.38% | |
31 | | -| 42 | 1.08781 | 1.09932 | 1.08262 | 1.07808 | 344.8 | 32.5 | 22.38% | |
32 | | -| 1234 | 1.08820 | 1.09898 | 1.08352 | 1.07848 | 334.5 | 31.7 | 22.38% | |
| 30 | +| 0 | 1.08728 | 1.09923 | 1.08209 | 1.07751 | 335.5 | 31.9 | 22.38% | |
| 31 | +| 42 | 1.08785 | 1.09937 | 1.08268 | 1.07809 | 316.6 | 32.2 | 22.38% | |
| 32 | +| 1234 | 1.08794 | 1.09941 | 1.08280 | 1.07813 | 332.2 | 32.0 | 22.38% | |
33 | 33 | | 1337 | 1.08772 | 1.09918 | 1.08246 | 1.07801 | 338.4 | 31.9 | 22.38% | |
34 | 34 | | 2025 | 1.08842 | 1.09957 | 1.08306 | 1.07862 | 333.4 | 32.0 | 22.38% | |
35 | 35 |
|
|
0 commit comments