Record: SP8192 + Order-6 Strict Full-Val Byte PPM — 0.96255 BPB (3-seed mean) by someone114514 · Pull Request #1877 · openai/parameter-golf

someone114514 · 2026-04-27T23:35:14Z

SP8192 + Order-6 Strict Full-Val Byte PPM

val_bpb = 0.96255 (3-seed mean, std 0.00047) | 15.997 MB mean artifact | 8xH100 SXM

This submission keeps the SP8192 recurrence / parallel-residual / QK-gain base stack and replaces the prior order-4 PPM setting with a strict full-validation order-6 byte-level PPM mixture at eval time. The PPM state is built online from the already-scored byte prefix, then updated only after each byte is scored.

Results

Seed	Post-EMA BPB	PPM BPB	Artifact bytes	Eval time
42	1.08754884	0.96261595	15,996,904	474.016s
7	1.08763287	0.96298648	15,999,992	464.055s
1337	1.08663175	0.96205812	15,994,492	463.261s
Mean	1.08727115	0.96255352	15,997,129	467.111s
Std	0.00055533	0.00046732	2,757	5.993s

The best seed is 1337 at 0.96205812 BPB. The largest observed total submission size is 15,999,992 bytes, still under the 16,000,000 byte cap.

Method

The eval path first computes the normal sliding-window neural-network NLLs with stride 64. It then converts the scored token stream into byte contributions and mixes the NN byte probability with an order-6 byte PPM-D probability:

p_mix = lambda * p_nn + (1 - lambda) * p_ppm

The gate is binary and prefix-only. With the submitted settings, PPM is trusted more when its longest-context top-symbol confidence is at least 0.9; otherwise the NN dominates.

Setting	Value
`PPM_ORDER`	`6`
`PPM_LAMBDA_HI`	`0.9`
`PPM_LAMBDA_LO`	`0.05`
`PPM_CONF_THRESHOLD`	`0.9`
`PPM_LOG_CACHE_SIZE`	`1048576`
`SKIP_QUANTIZED_EVAL`	`1`
`SLIDING_BATCH_SEQS`	`32`

Order 6 was selected after full-val checks. Order 7 and order 8 were slower and worse on seed 42, so they are not part of the submitted result.

Compliance

Causal scoring: both NN scoring and PPM scoring use only the prefix available before the current byte.
Score before update: PPM counts are updated after the byte's mixed log-probability is recorded.
Single pass: validation bytes are scored once in order; there is no rescoring or best-of-run selection.
Normalized distribution: PPM-D produces a valid byte distribution and the mixture is performed in probability space.
Full validation: submitted scores use the full validation stream, not a subset.
No SLOT, no TTT, no ETLB, and no n-gram cache in the submitted packed artifact.

Reproduce

RUN_ID=strict_ppm_order6_seed42 \
SEED=42 \
PPM_ENABLED=1 \
PPM_NATIVE_ENABLED=1 \
PPM_ORDER=6 \
PPM_LAMBDA_HI=0.9 \
PPM_LAMBDA_LO=0.05 \
PPM_CONF_THRESHOLD=0.9 \
PPM_LOG_CACHE_SIZE=1048576 \
SKIP_QUANTIZED_EVAL=1 \
SLIDING_BATCH_SEQS=32 \
torchrun --standalone --nproc_per_node=8 \
  records/track_10min_16mb/2026-04-27_SP8192_Order6StrictBytePPM/train_gpt.py

Change SEED and RUN_ID to reproduce the other two logs.

sharpobject · 2026-04-27T23:36:56Z

If you score all token ids at a given token-wise position in the document, do the probabilities for all of these token ids given by the mix of the byte-wise PPM and the token-wise NN sum to 1? (hint: no)

@sharpobject

…olar Express NS + MIN_LR + LQER) Triage of 5 new PRs the user surfaced (1858, 1852, 1855, 1874, 1877): - openai#1852: hard rule violation (pre-quant TTT on validation data). - openai#1858: eval subset (8M of 40.5M tokens), reviewer caught and author admitted. - openai#1877: broken normalization (byte PPM × token NN doesn't sum to 1 over token alphabet), reviewer @sharpobject caught. - openai#1855: techniques mostly legit but apt-get install lrzip violates Issue openai#1017 Rule 3 (artifact must be self-contained). - openai#1874: LEGITIMATE - 3-seed mean 1.06766, std 0.00076, three orthogonal training-time techniques citing prior validated PRs. If it merges, our submission threshold shifts from 1.0760 to ~1.0627. PR openai#1874's three techniques: 1. Polar Express NS coefficients (PR openai#1344) - 5 minimax-tuned tuples replace the fixed (3.4445, -4.775, 2.0315) at MUON_BACKEND_STEPS=5. 2. MIN_LR=0.10 warmdown floor (PR openai#1787) - LR floors at 10% of max instead of decaying to 0. Already wired in our v1+; just env-var opt-in. 3. LQER asymmetric int4 rank-4 quantization correction (PR openai#1797) - SVD on top-K=3 highest-error GPTQ residuals, packed as int4 per-group-64 asymmetric. ~200-400 LOC; deferred to v4. train_gpt_v3.py implements (1) and exposes (2): - POLAR_EXPRESS_NS=0 default (byte-for-byte SOTA when off). - _PE_COEFFS module-level constant + _POLAR_EXPRESS_NS flag read at import time so torch.compile sees them as constants. - zeropower_via_newtonschulz5 branches on _POLAR_EXPRESS_NS to use per-iteration coefficients instead of fixed. - MIN_LR was already an env var; setting MIN_LR=0.10 at runtime opts in. Sizes: v3 raw 54,977 lzma 15,128 (+272 vs v2, +1,880 vs SOTA). Worst- seed artifact slack: ~4,888 bytes under cap. Tight but workable. AST-validated on Python 3.13 (macOS) and 3.12 (Vultr Linux). Stacking projection (single-seed): - Phase 0 baseline: 1.08038 - + LR=0.010 (Stage 2): 1.08021 - + Polar Express NS: 1.0787-1.0797 - + MIN_LR=0.10: 1.0777-1.0794 - + ConfTTT (PR openai#1879): 1.0772-1.0793 - + LQER (v4 work): 1.0742-1.0783 - + Phase 2 architecture: 1.0712-1.0773 - + Newton-Muon Stage E: 1.066-1.075 Path B (absorb-and-stack) recommended over Path A (race-to-merge-with- current-stack) since current stack alone doesn't clear 1.0760. Race awareness: openai#1874, openai#1855 (lrzip-stripped), and openai#1797 are all open. Whichever merges first becomes new SOTA and our threshold tightens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Record: SP8192 order-6 strict byte PPM, 0.96255 BPB

a1fa59d

leon2k2k2k mentioned this pull request Apr 28, 2026

Record: PR #1850 + Anti-Hijack Gate — val_bpb 0.99445 (full val) #1885

Open

3 tasks

Christopher-Lee-McClendon mentioned this pull request Apr 29, 2026

Non-record: Audited Byte-Level Neural/PPM-D Mixture BPB = 1.5221 (Full Validation) — Framework for Legal Score-First PPM-D Mixtures #1916

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 + Order-6 Strict Full-Val Byte PPM — 0.96255 BPB (3-seed mean)#1877

Record: SP8192 + Order-6 Strict Full-Val Byte PPM — 0.96255 BPB (3-seed mean)#1877
someone114514 wants to merge 1 commit intoopenai:mainfrom
someone114514:record-sp8192-order6-strict-byte-ppm-0427

someone114514 commented Apr 27, 2026

Uh oh!

sharpobject commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

someone114514 commented Apr 27, 2026

SP8192 + Order-6 Strict Full-Val Byte PPM

Results

Method

Compliance

Reproduce

Uh oh!

sharpobject commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants