Skip to content

Commit 74d53e8

Browse files
committed
Full 5-seed mini-wrapper verification: val_bpb 1.07807 (std 0.00040)
All 5 seeds (s0, s42, s1234, s1337, s2025) re-run via the shipped mini wrapper. The mean improves slightly from the prior mixed-source 1.07813 to 1.07807 because s1234 produced a noticeably lower TTT under the mini wrapper (1.07813 mini vs 1.07848 raw, -0.00035 — within float64 reordering noise but the largest single-seed drift in the verification set). All 5 artifact sizes are direct from the mini-wrapper runs (NOT projections): - s0: 15,992,304 bytes (7,696 byte headroom) - s42: 15,993,733 bytes (6,267 byte headroom) - s1234: 15,990,539 bytes (9,461 byte headroom) - s1337: 15,988,039 bytes (11,961 byte headroom) - s2025: 15,992,215 bytes (7,785 byte headroom) Margins vs the legal open chronology: - vs PR openai#1394 (1.08563): -0.01952 nats per token (margin +0.01452 over 0.005 bar) - vs PR openai#1420 (1.08014): -0.00534 nats per token (margin +0.00034 over 0.005 bar) - vs own PR openai#1413 (1.08279): -0.01218 nats per token All four issue openai#1017 conditions remain verified for the n-gram tilt path.
1 parent c9aa6c3 commit 74d53e8

5 files changed

Lines changed: 496 additions & 494 deletions

File tree

records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/README.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,35 @@
1-
# Record: SP8192 + Parallel Residuals + 3-Layer Recurrence + Legal N-gram Tilt — val_bpb 1.07813 (5-seed mean)
1+
# Record: SP8192 + Parallel Residuals + 3-Layer Recurrence + Legal N-gram Tilt — val_bpb 1.07807 (5-seed mean)
22

3-
**val_bpb: 1.07813** (5-seed mean, std 0.00046) | **2.78491 nats per token** | **~15.99 MB** | 8×H100 SXM, 600 s | Legal Score-First TTT + Causal N-gram Tilt
3+
**val_bpb: 1.07807** (5-seed mean, std 0.00040) | **2.78478 nats per token** | **~15.99 MB** | 8×H100 SXM, 600 s | Legal Score-First TTT + Causal N-gram Tilt
44

5-
Beats [PR #1394](https://github.com/openai/parameter-golf/pull/1394) (1.08563) by **0.00750 bpb / 0.01938 nats per token** on a 5-seed mean, comfortably clearing the 0.005-nats record threshold. Beats [PR #1420](https://github.com/openai/parameter-golf/pull/1420) (1.08014) by **0.00201 bpb / 0.00520 nats per token**, clearing the 0.005-nats threshold against the next-best legal open PR. Beats our own [PR #1413](https://github.com/openai/parameter-golf/pull/1413) (1.08279) by **0.00466 bpb / 0.01205 nats per token**.
5+
Beats [PR #1394](https://github.com/openai/parameter-golf/pull/1394) (1.08563) by **0.00756 bpb / 0.01952 nats per token** on a 5-seed mean, comfortably clearing the 0.005-nats record threshold. Beats [PR #1420](https://github.com/openai/parameter-golf/pull/1420) (1.08014) by **0.00207 bpb / 0.00534 nats per token**, clearing the 0.005-nats threshold against the next-best legal open PR. Beats our own [PR #1413](https://github.com/openai/parameter-golf/pull/1413) (1.08279) by **0.00472 bpb / 0.01218 nats per token**.
66

77
## Results (8×H100 80GB SXM, PyTorch 2.9.1+cu128, legal score-first TTT with causal n-gram tilt)
88

9-
### Core (TTT) table — 5-seed verification
9+
### Core (TTT) table — 5-seed verification, all seeds re-run via shipped mini wrapper
1010

11-
| Seed | Steps | Pre-quant BPB | Sliding BPB | **Post-TTT (n-gram tilted) BPB** | val_loss (nats) | Artifact (mini, bytes) |
11+
| Seed | Steps | Pre-quant BPB | Sliding BPB | **Post-TTT (n-gram tilted) BPB** | val_loss (nats) | Artifact (bytes) |
1212
|---:|---:|---:|---:|---:|---:|---:|
13-
| 0 | 4911 | 1.08717 | 1.08220 | **1.07743** | 2.78312 | ~15,990,971 (proj from raw delta) |
14-
| 42 | 4913 | 1.08781 | 1.08262 | **1.07808** | 2.78479 | **15,993,733 (mini-verified)** |
15-
| 1234 | 4905 | 1.08820 | 1.08352 | **1.07848** | 2.78581 | ~15,988,567 (proj from raw delta) |
16-
| 1337 | 4909 | 1.08772 | 1.08246 | **1.07801** | 2.78461 | **15,988,039 (mini-verified)** |
17-
| 2025 | 4908 | 1.08842 | 1.08306 | **1.07862** | 2.78620 | **15,992,215 (mini-verified)** |
18-
| **5-seed mean** | | **1.08786** | **1.08277** | **1.07813** | **2.78491** | all under 16,000,000 |
13+
| 0 | 4918 | 1.08728 | 1.08209 | **1.07751** | 2.78333 | **15,992,304** |
14+
| 42 | 4911 | 1.08785 | 1.08268 | **1.07809** | 2.78481 | **15,993,733** |
15+
| 1234 | 4908 | 1.08794 | 1.08280 | **1.07813** | 2.78492 | **15,990,539** |
16+
| 1337 | 4909 | 1.08772 | 1.08246 | **1.07801** | 2.78461 | **15,988,039** |
17+
| 2025 | 4908 | 1.08842 | 1.08306 | **1.07862** | 2.78620 | **15,992,215** |
18+
| **5-seed mean** | | **1.08784** | **1.08262** | **1.07807** | **2.78478** | all < 16,000,000 |
1919

20-
**Verification status (5-seed update):**
21-
- All 5 seeds use the same shipped configuration (`pr1394_with_ttt.py` with `PARALLEL_RESIDUAL_START=7 LOOP_START=3 LOOP_END=5 NGRAM_TILT_ENABLED=1 QK_GAIN_INIT=5 TTT_ENABLED=1` defaults).
22-
- **3 of 5 seeds** (s42, s1337, s2025) have been independently re-run via the shipped `train_gpt.py` self-extracting LZMA mini wrapper (~18.9 KB code) and verified to fit under 16,000,000 bytes with the BPB matching within float64 noise (s42 raw 1.07808 vs s42 mini 1.07809).
23-
- **s0 and s1234** were initially scored from the readable source (`pr1394_with_ttt.py`, ~79 KB code) and their mini-wrapper artifact sizes are projected from the verified s42 raw-vs-mini delta (65,913 bytes saved). Both project comfortably under 16,000,000 bytes. Mini-wrapper re-runs of s0 and s1234 are in progress; this PR will be updated when they land if the BPB drift is non-trivial.
24-
- 5-seed standard deviation: **0.00046 BPB** (5-seed standard error of the mean: ~0.00021).
20+
**Verification status:**
21+
- **All 5 seeds independently re-run via the shipped `train_gpt.py` self-extracting LZMA mini wrapper** (~18.9 KB code, ~57 KB decoded payload). Each artifact is the actual `Total submission size quantized+brotli` from the mini-wrapper run, NOT a projection.
22+
- **All 5 artifacts fit under 16,000,000 bytes** with 6,267–11,961 byte headroom.
23+
- 5-seed standard deviation: **0.00040 BPB** (5-seed standard error of the mean: ~0.00018).
24+
- BPB values are reported from the legal score-first TTT eval pass with causal n-gram tilt applied; sliding (no-TTT) and pre-quant numbers are also shown for diagnostic transparency.
2525

26-
### Diagnostics
26+
### Diagnostics (mini-wrapper runs)
2727

2828
| Seed | Pre-quant BPB | Quantized roundtrip BPB | Sliding BPB | TTT BPB | TTT eval (s) | N-gram precompute (s) | N-gram hint coverage |
2929
|---:|---:|---:|---:|---:|---:|---:|---:|
30-
| 0 | 1.08717 | 1.09895 | 1.08220 | 1.07743 | 333.6 | 31.8 | 22.38% |
31-
| 42 | 1.08781 | 1.09932 | 1.08262 | 1.07808 | 344.8 | 32.5 | 22.38% |
32-
| 1234 | 1.08820 | 1.09898 | 1.08352 | 1.07848 | 334.5 | 31.7 | 22.38% |
30+
| 0 | 1.08728 | 1.09923 | 1.08209 | 1.07751 | 335.5 | 31.9 | 22.38% |
31+
| 42 | 1.08785 | 1.09937 | 1.08268 | 1.07809 | 316.6 | 32.2 | 22.38% |
32+
| 1234 | 1.08794 | 1.09941 | 1.08280 | 1.07813 | 332.2 | 32.0 | 22.38% |
3333
| 1337 | 1.08772 | 1.09918 | 1.08246 | 1.07801 | 338.4 | 31.9 | 22.38% |
3434
| 2025 | 1.08842 | 1.09957 | 1.08306 | 1.07862 | 333.4 | 32.0 | 22.38% |
3535

records/track_10min_16mb/2026-04-07_SP8192_ParallelResid7_Loop35_NgramTilt/submission.json

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
{
22
"name": "SP8192 + Parallel Residuals + 3-Layer Recurrence + Legal N-gram Tilt",
3-
"val_bpb": 1.07813,
4-
"val_loss": 2.78491,
5-
"bytes_total": 15992215,
6-
"blurb": "3-lever stack on top of PR #1394 sp8192 baseline: (1) GPT-J parallel residuals on layers 7-10 (PR #1412 @Robby955), (2) 3-layer depth recurrence (loop layers 3-5 twice instead of 4-5 twice), (3) eval-time causal n-gram tilt with one-token exponential rescaling (PR #1420 @abaybektursun, lineage PR #1145 @AnirudhRahul). All four issue #1017 conditions verified. C++ n-gram kernel ported from PR #1420 with nanobind dependency removed (ctypes shim). 5-seed mean 1.07813 BPB (std 0.00046) beats PR #1394 (1.08563) by 0.01938 nats per token, beats PR #1420 (1.08014) by 0.00520 nats per token, beats own PR #1413 (1.08279) by 0.01205 nats per token.",
3+
"val_bpb": 1.07807,
4+
"val_loss": 2.78478,
5+
"bytes_total": 15993733,
6+
"blurb": "3-lever stack on top of PR #1394 sp8192 baseline: (1) GPT-J parallel residuals on layers 7-10 (PR #1412 @Robby955), (2) 3-layer depth recurrence (loop layers 3-5 twice instead of 4-5 twice), (3) eval-time causal n-gram tilt with one-token exponential rescaling (PR #1420 @abaybektursun, lineage PR #1145 @AnirudhRahul). All four issue #1017 conditions verified. C++ n-gram kernel ported from PR #1420 with nanobind dependency removed (ctypes shim). 5-seed mean 1.07807 BPB (std 0.00040, all 5 seeds mini-wrapper-verified for fit and BPB) beats PR #1394 (1.08563) by 0.01952 nats per token, beats PR #1420 (1.08014) by 0.00534 nats per token, beats own PR #1413 (1.08279) by 0.01218 nats per token.",
77
"author": "dexhunter",
88
"github_id": "dexhunter",
99
"date": "2026-04-07",
1010
"seed_results": {
11-
"0": {"val_bpb": 1.07743, "val_loss": 2.78312, "steps": 4911},
12-
"42": {"val_bpb": 1.07808, "val_loss": 2.78479, "steps": 4913},
13-
"1234": {"val_bpb": 1.07848, "val_loss": 2.78581, "steps": 4905},
14-
"1337": {"val_bpb": 1.07801, "val_loss": 2.78461, "steps": 4909},
15-
"2025": {"val_bpb": 1.07862, "val_loss": 2.78620, "steps": 4908}
11+
"0": {"val_bpb": 1.07751, "val_loss": 2.78333, "steps": 4918, "artifact_bytes": 15992304},
12+
"42": {"val_bpb": 1.07809, "val_loss": 2.78481, "steps": 4911, "artifact_bytes": 15993733},
13+
"1234": {"val_bpb": 1.07813, "val_loss": 2.78492, "steps": 4908, "artifact_bytes": 15990539},
14+
"1337": {"val_bpb": 1.07801, "val_loss": 2.78461, "steps": 4909, "artifact_bytes": 15988039},
15+
"2025": {"val_bpb": 1.07862, "val_loss": 2.78620, "steps": 4908, "artifact_bytes": 15992215}
1616
},
1717
"lineage": [
1818
"PR #1394 (clarkkev) — sp8192 base",

0 commit comments

Comments
 (0)