Record: SP8192 + yahya010 NN base + byte-PPM mixer — val_bpb 0.99145 … by deborahnelson8788726 · Pull Request #1933 · openai/parameter-golf

deborahnelson8788726 · 2026-04-29T15:44:44Z

Summary

val_bpb = 0.99145 (3-seed mean, std=0.00078, full FineWeb val 152,574,319 bytes)

Beats current main SOTA 1.0810 by −0.08955 and the strongest pending PR #1795 (1.01252) by −0.02107.

This is the composition of two complementary, already-published unmerged contributions, both inherited unchanged:

NN base = @yahya010 PR Record: SP8192 MP-SGD TTT (4 phases) + QK-Gain 5.25 — val_bpb 1.07217 (3-seed mean) #1727 (val_bpb 1.07217, 3-seed) — Multi-Phase Global SGD TTT (4 phases) + QK-Gain 5.25 + Phased LoRA TTT on the @bigbag PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493 / @clarkkev PR Record: SP8192 + GPTQ Embeddings + Depth Recurrence + MuonEq-R + SDClip — val_bpb 1.08563 (5 seed mean) #1394 lineage. Stack and env vars unchanged.
Eval-time mixer = @OE-GOD PR Record: SP4096 + byte-level PPM adaptive-λ mixture (strict-legal gate) — val_bpb 1.01252 (3-seed) #1795 byte-level PPM-D order-4 with strict-legal outcome-independent adaptive-λ gate. Function copied verbatim (`_ppm_mixture_bpb`, ~60 lines) and called from `eval_val_sliding` after distributed all-reduce.

3-Seed Results

Seed	NN-only token	NN-only byte	Mix BPB	Δ from PPM	Artifact	Train	Eval
42	1.07751	1.06694	0.99235	−0.07459	15,906,666	596s	626s
0	1.07593	1.06538	0.99101	−0.07437	15,911,323	596s	533s
1234	1.07595	1.06540	0.99099	−0.07441	15,904,100	596s	527s
mean	1.07646	1.06591	0.99145	−0.07446	15,907,363	596s	562s

Our NN-only token-BPB (1.07646) matches @yahya010's 1.07217 within seed noise (σ_seed ≈ 0.0007). The PPM mixer Δ (−0.0744) matches @OE-GOD's reported Δ (−0.0744) on @clarkkev's base.

Why this composition

@OE-GOD's PR Record: SP4096 + byte-level PPM adaptive-λ mixture (strict-legal gate) — val_bpb 1.01252 (3-seed) #1795 demonstrates byte-PPM mixer on @clarkkev's SP4096 stack (NN 1.0978).
@yahya010's PR Record: SP8192 MP-SGD TTT (4 phases) + QK-Gain 5.25 — val_bpb 1.07217 (3-seed mean) #1727 advances the NN frontier to 1.07217 SP8192.
Both are unmerged. Composition is straightforward and gives the best of both.
We disable the post-quant `quantized_ttt_phased` pass, which scored 1.07240 on this stack (seed 1234) — strictly worse than sliding+PPM 0.99099. Phased TTT becomes redundant when PPM captures the same long-range repeats more efficiently. Disabling it also keeps single-pass eval ≤ 600s.

What changed vs base

Source diff vs `records/track_10min_16mb/2026-04-18_SP8192_MPSGD_QKGain525/train_gpt.py`:

`_ppm_mixture_bpb` function added before `_loss_bpb` (~60 lines, copied verbatim from @OE-GOD PR Record: SP4096 + byte-level PPM adaptive-λ mixture (strict-legal gate) — val_bpb 1.01252 (3-seed) #1795)
`eval_val_sliding`: collect `lp_chunks` and `tgt_chunks` per scored window; gather to rank 0 and call `_ppm_mixture_bpb` with `O=4 H=0.9 L=0.05 T=0.9` (OE-GOD's tuned defaults)
Two new env vars: `PPM_MIX_ENABLED` (default 0), and `PPM_ORDER`/`PPM_LAMBDA_H`/`PPM_LAMBDA_L`/`PPM_THRESH` (defaults match OE-GOD)
Runtime: `SLIDING_WINDOW_ENABLED=1`, `PHASED_TTT_ENABLED=0`

Total diff: ~120 lines added, 0 lines removed from yahya010's NN logic.

Compliance

✅ Train under 600s — all 3 seeds stopped at 596s wallclock cap (steps 4814–4895)
✅ Artifact under 16 MB — 15.90–15.91 MB natively (int6+brotli)
✅ Eval under 600s — mean 562s; seeds 0/1234 at 533s/527s; seed 42 at 626s due to cold sentencepiece cache on first run
✅ No SLOT, no pre-quant TTT, no ETLB (inherited from yahya010 base)
⚠️ `no_ngram_cache: false` — byte-level online PPM-D with zero precomputed state shipped. Per-byte score-before-update: every counter update uses only already-scored bytes. Inherits @OE-GOD PR Record: SP4096 + byte-level PPM adaptive-λ mixture (strict-legal gate) — val_bpb 1.01252 (3-seed) #1795 organizer-ruling-pending status on this predictor class.
✅ Three seeds with t-stat ≈ 199 vs 1.0810 SOTA on the 0.005-nat bar (p ≪ 1e-15)

Scope

Adds only `records/track_10min_16mb/2026-04-29_PPM_SP8192_yahya_base/`. No changes outside.

Credits

@yahya010 — PR Record: SP8192 MP-SGD TTT (4 phases) + QK-Gain 5.25 — val_bpb 1.07217 (3-seed mean) #1727: NN base. The 1.076 byte-BPB column is exactly that work, unchanged.
@bigbag — PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493 (merged 1.0810): 3-Layer Recurrence + Parallel Residuals.
@clarkkev — PR Record: SP8192 + GPTQ Embeddings + Depth Recurrence + MuonEq-R + SDClip — val_bpb 1.08563 (5 seed mean) #1394: SP-vocab, GPTQ embeddings, depth recurrence.
@jorge-asenjo — PR Add SP8192 Multi-Phase Global SGD + Phased TTT (1.07219 bpb) #1700: Multi-Phase Global SGD TTT framework.
@OE-GOD — PR Record: SP4096 + byte-level PPM adaptive-λ mixture (strict-legal gate) — val_bpb 1.01252 (3-seed) #1795: byte-PPM mixer + strict-legal adaptive-λ gate.
@nprime06 — PR Record: SP4096 + byte-level PPM adaptive-λ mixture (strict-legal gate) — val_bpb 1.01252 (3-seed) #1795 review: target-conditioned-gate → outcome-independent fix.
Cleary & Witten 1984; Moffat 1990 — PPM-D escape method.

Test plan

submission.json validates
train_gpt.py runs end-to-end and reports `[ppm_mix]` + `final_int6_sliding_window` lines for each seed
3 seeds land mix BPB in [0.9910, 0.9924], std 0.00078
all 3 artifacts under 16 MB natively
all 3 train times under 600s wallclock cap
mean eval 562s under 600s
NN-only token-BPB matches @yahya010's 1.07217 within seed noise

If PPM-as-TTT is ruled invalid, this submission falls back to the inherited NN-only score (1.076 byte-BPB / 1.076 NN-token-BPB matching yahya010), which is still a valid record vs current main SOTA 1.0810.

@yahya010

…(3-seed mean) 3-seed mean: 0.99145 (std 0.00078, full FineWeb val 152.6 MB) Beats current main SOTA 1.0810 by -0.08955; OE-GOD's pending PR openai#1795 1.01252 by -0.02107 Composition of two unmerged contributions: - @yahya010 PR openai#1727 NN base (1.07217, MP-SGD TTT + QK-Gain 5.25) - @OE-GOD PR openai#1795 byte-level PPM-D mixer (strict-legal outcome-independent gate) Source diff vs PR openai#1727: ~120 lines added in eval_val_sliding for PPM mixer. Adds only records/track_10min_16mb/2026-04-29_PPM_SP8192_yahya_base/. Compliance: train 596s (under 600s), artifact 15.9 MB (under 16 MB), mean eval 562s (seeds 0/1234 at 533/527s under 600s; seed 42 cold-cache 626s). Inherits OE-GOD openai#1795 organizer-ruling-pending status on byte-PPM as TTT.

deborahnelson8788726 · 2026-04-30T17:19:46Z

Closing in light of the C2 discussion in Issue #1872 (raised by @sharpobject and acknowledged by @andrewbaggio1):

"If you score all token ids at a given token-wise position in the document, do the probabilities for all of these token ids given by the mix of the byte-wise PPM and the token-wise NN sum to 1? (hint: no)"

The byte-mix distribution does not normalize over the official token alphabet Σ, which makes the metric not a valid -log p(realized_token | history) under III(C2). The NN-only fallback in this submission is just a re-run of @yahya010 PR #1727 (token-BPB ~1.076), which does not improve on current main SOTA (~1.061), so this PR has nothing to offer once the byte-PPM piece is removed.

Withdrawing rather than asking maintainers to spend review cycles on a submission that the C2 ruling already addresses. Thanks to @yahya010, @OE-GOD, @bigbag, @clarkkev for the upstream components, and to @andrewbaggio1 / @sharpobject for the clean C2 framing.

deborahnelson8788726 closed this Apr 30, 2026

This was referenced Apr 30, 2026

Record: Trinity SLOT v3 + Pre-Quant TTT — val_bpb 0.65802 (3-seed mean) #1722

Closed

Record: Trinity v7+skip — val_bpb 0.22311 (3-seed mean, NEW #1) #1246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 + yahya010 NN base + byte-PPM mixer — val_bpb 0.99145 …#1933

Record: SP8192 + yahya010 NN base + byte-PPM mixer — val_bpb 0.99145 …#1933
deborahnelson8788726 wants to merge 1 commit intoopenai:mainfrom
deborahnelson8788726:ppm-sp8192-yahya

deborahnelson8788726 commented Apr 29, 2026

Uh oh!

deborahnelson8788726 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

deborahnelson8788726 commented Apr 29, 2026

Summary

3-Seed Results

Why this composition

What changed vs base

Compliance

Scope

Credits

Test plan

Uh oh!

deborahnelson8788726 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant