Record: SP8192 + Strict Full-Val Byte PPM Mixture — 1.00495 BPB (3-seed mean)#1850
Open
someone114514 wants to merge 1 commit intoopenai:mainfrom
Open
Record: SP8192 + Strict Full-Val Byte PPM Mixture — 1.00495 BPB (3-seed mean)#1850someone114514 wants to merge 1 commit intoopenai:mainfrom
someone114514 wants to merge 1 commit intoopenai:mainfrom
Conversation
304dff5 to
37ce906
Compare
phaniratan1234
pushed a commit
to phaniratan1234/parameter-golf
that referenced
this pull request
Apr 27, 2026
…ocal experiments Made-with: Cursor
This was referenced Apr 27, 2026
sunnypatneedi
pushed a commit
to sunnypatneedi/parameter-golf
that referenced
this pull request
Apr 27, 2026
… required; PR openai#1848 BPB risk; Day 18 plateau; Session 23 - Merged SOTA still 1.0810 (Day 18, no change since Apr 9) - PPM-D byte mixture confirmed by dexhunter at 1.0322 (PR openai#1857, self-closed) - SmearGate BOS bug documented: prev-token leaks at document boundaries; fix required - PR openai#1848 (newjordan, 0.87980) flagged BPB risk: sibling PR openai#1846 closed same day - PR openai#1858 (0.9946) only covers 8M/40.5M tokens — not leaderboard-comparable - PR openai#1855 (codemath3000, 1.06108) and openai#1851 (aquariouseworkman, 1.06128) both clean - PPM-D wave: PRs openai#1850, openai#1854, openai#1835 await organizer ruling - Added Session 23 lessons to CLAUDE.md - 3 days to deadline (Apr 30) — final GPU run window https://claude.ai/code/session_01RmJtLYUmKNzDgDVTnWoKzU
58bed14 to
37ce906
Compare
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
Apr 28, 2026
… notes spec 052: PPM-D byte mixture port from PR openai#1850 onto 047B + our anti-hijack gate tuning. Phase 1 measured end-to-end at mix_bpb_sidecar = 1.00506, matching PR openai#1850's 1.00495 within 0.0001. spec 055: full submission run — train 050 baseline from scratch, apply same tuned PPM at eval. Single train_gpt.py file. Predicts 1.005 +/- 0.003. Code: exp/055-050-with-ppm-fullrun @ c27be23. ideas: - ppm-port-on-047B.md — narrative of the PPM port discovery, headroom analysis (1850 vs 1857 vs us), and why anti-hijack was the bigger lever. - ppm-d-mixture-and-anti-hijack.md — full math: per-token NN -> per-byte spreading, PPM-D Howard escape-D, the gate (1850 raw + anti-hijack override), log-sum-exp mixture, and the 4.32-bit hijack geometry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
leon2k2k2k
added a commit
to leon2k2k2k/parameter-golf
that referenced
this pull request
Apr 28, 2026
Earlier default 4194304 (OMP-chunked) was suboptimal — saves ~230s eval time but loses ~0.010 BPB sidecar from chunk-reset penalty. PR openai#1850 chose single- pass deliberately and pays the 252s scoring cost for the bigger gain. Single-pass timing on 8H per 1850's measurements: pre-quant + gptq + ema: ~85s diagnostic quantized eval: ~60s non-overlap forward (8-way): ~20s file gather: ~5s single-pass PPM scoring: ~250s (CPU-bound, not GPU) ──────────────────────────────────── total eval phase: ~420s under 600s cap Smokes (where wallclock matters more than gain) can override with PPM_OMP_CHUNK_TOKENS=4194304. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fija
pushed a commit
to Fija/parameter-golf
that referenced
this pull request
Apr 28, 2026
WHY: V1 NGramMixer (fixed-order bigram + Dirichlet uniform smoothing) failed because cold-start q_bi was uniform → mixing in noise. V2 (TempScaler) failed because the trained NN is already calibrated. The actual large entropy gap that PPM byte mixture exploits is *local verbatim repetition* (URLs, code identifiers, repeated phrases) that a 5M-param NN averages over. WHAT: Cleary-Witten 1984 PPM-D over the SP token alphabet (Σ_token=8192), with backoff via escape mechanism. Distribution defined on Σ_token resolves the byte-vs-token C2 dispute (Issue openai#1872) cleanly. Binary λ gate (PR openai#1850 pattern): if PPM confidence at deepest matched context ≥ threshold, λ=lambda_lo (mostly trust PPM); else λ=lambda_hi (mostly trust NN). LEGALITY: All four conditions of Issue openai#1017: C1: ctx[k] only contains counts from already-scored tokens C2: P_K(·|prev) = recursive PPM-D blend, sums to 1 over Σ_token (verified by `test_ppm_c2_full_normalized`); convex combination with NN softmax preserves normalization C3: λ-gate uses confidence at deepest matched context (prev-only), computed before observing target. update_stream is called AFTER mix_nll C4: monotonic state, single left-to-right pass VALIDATION: 23/23 unit tests pass on CPU including a functional toy benchmark — on a chunked synthetic stream with strong repetition motifs, PPM gives -3.2 nats/token improvement vs NN baseline. (Real FineWeb is much less repetitive but the byte-level PPM cluster has shown -0.05 to -0.20 BPB improvements on this challenge, suggesting token-level can capture similar entropy.) INTEGRATION: eval_val sub-chunked W=128 (env: PPM_CHUNK_TOKENS) so within- batch repetition is captured. State carries across batches via the mixer object. Eval_val_ttt_phased path NOT touched yet (would need per-doc-slot PPM tables; deferred to V4 if V3 numbers warrant). ENV: PPM_MIX_ENABLED, PPM_MAX_ORDER (default 2), PPM_LAMBDA_LO (0.05), PPM_LAMBDA_HI (0.9), PPM_CONF_THRESHOLD (0.9), PPM_CHUNK_TOKENS (128).
This was referenced Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
3-seed mean
val_bpb1.00495 (std0.00072). Best/min seed is 1.00425333 BPB (seed 1337). Compared to the merged 2026-04-09 SP8192 legal TTT record at 1.0810 BPB, this improves by 0.0761 BPB, comfortably past the 0.005-nat threshold and over 100x the observed inter-seed std. All three artifacts stay under the 16 MB cap.The submission adds one scoring component on top of the existing SP8192 training stack: a binary-lambda-gated PPM-D byte-level mixture applied to the sliding-window NN log-probs at eval time. The mixture is constructed to fit the score-before-update discipline: each byte is scored from the prefix PPM state, then inserted into the PPM counts for future bytes.
The Contribution
A binary-lambda-gated PPM-D mixture over an already-scored byte stream, computed at eval time and mixed with the NN's per-byte log-probabilities in probability space.
For each predicted byte at position
t:lambda_lo=0.05mostly trusts PPM; otherwiselambda_hi=0.9mostly trusts the NN.p_mix = lambda * p_NN + (1 - lambda) * p_PPM, then-log(p_mix)contributes to byte BPB.The implementation uses
PPM_ORDER=4,PPM_LAMBDA_HI=0.9,PPM_LAMBDA_LO=0.05, andPPM_CONF_THRESHOLD=0.9in the submitted logs.Why this helps here: the parameter-constrained SP8192 NN still has a byte-level surprisal floor on highly repetitive local byte contexts such as identifiers, URLs, numeric literals, and repeated formatting fragments. PPM is strong exactly in those high-confidence local contexts. The binary gate is intentionally conservative: it trusts PPM only when the prefix counts indicate a strong local continuation, and otherwise falls back toward the NN.
Per-Seed Results
Three independent seeds, all with
ppm_mix < 1.006. The headline number is the PPM mixture returned asquantized_sliding_window val_bpb. The logs also reportnn_token_bpb,nn_byte_bpb, andppm_onlyfor auditability.Legality / Issue #1017
The PPM mixture is implemented inside a strict score-before-update eval-time path.
Additionally:
eval_val_slidingfor each run and is not persisted across invocations.Implementation Notes
The scorer is native C compiled at runtime with
gcc -O3from the packed script. It uses:/tmpfor distributed sliding collection.The Python PPM reference and eval-time TTT were removed from the packed artifact to keep the submission under the 16 MB cap. Native exactness was checked against the Python reference during development before trimming.
Compliance Numbers
All three seeds are under 16,000,000 bytes.
Files
records/track_10min_16mb/2026-04-26_SP8192_StrictFullValPPM/train_gpt.pyrecords/track_10min_16mb/2026-04-26_SP8192_StrictFullValPPM/submission.jsonrecords/track_10min_16mb/2026-04-26_SP8192_StrictFullValPPM/README.mdrecords/track_10min_16mb/2026-04-26_SP8192_StrictFullValPPM/train_seed1337.logrecords/track_10min_16mb/2026-04-26_SP8192_StrictFullValPPM/train_seed42.logrecords/track_10min_16mb/2026-04-26_SP8192_StrictFullValPPM/train_seed7.logReproduce
Change
SEEDandRUN_IDfor seeds 7 and 1337.