Record: SP8192 + Order-6 Strict Full-Val Byte PPM — 0.96255 BPB (3-seed mean)#1877
Open
someone114514 wants to merge 1 commit intoopenai:mainfrom
Open
Record: SP8192 + Order-6 Strict Full-Val Byte PPM — 0.96255 BPB (3-seed mean)#1877someone114514 wants to merge 1 commit intoopenai:mainfrom
someone114514 wants to merge 1 commit intoopenai:mainfrom
Conversation
|
If you score all token ids at a given token-wise position in the document, do the probabilities for all of these token ids given by the mix of the byte-wise PPM and the token-wise NN sum to 1? (hint: no) |
GodlyDonuts
added a commit
to GodlyDonuts/parameter-golf
that referenced
this pull request
Apr 28, 2026
…olar Express NS + MIN_LR + LQER) Triage of 5 new PRs the user surfaced (1858, 1852, 1855, 1874, 1877): - openai#1852: hard rule violation (pre-quant TTT on validation data). - openai#1858: eval subset (8M of 40.5M tokens), reviewer caught and author admitted. - openai#1877: broken normalization (byte PPM × token NN doesn't sum to 1 over token alphabet), reviewer @sharpobject caught. - openai#1855: techniques mostly legit but apt-get install lrzip violates Issue openai#1017 Rule 3 (artifact must be self-contained). - openai#1874: LEGITIMATE - 3-seed mean 1.06766, std 0.00076, three orthogonal training-time techniques citing prior validated PRs. If it merges, our submission threshold shifts from 1.0760 to ~1.0627. PR openai#1874's three techniques: 1. Polar Express NS coefficients (PR openai#1344) - 5 minimax-tuned tuples replace the fixed (3.4445, -4.775, 2.0315) at MUON_BACKEND_STEPS=5. 2. MIN_LR=0.10 warmdown floor (PR openai#1787) - LR floors at 10% of max instead of decaying to 0. Already wired in our v1+; just env-var opt-in. 3. LQER asymmetric int4 rank-4 quantization correction (PR openai#1797) - SVD on top-K=3 highest-error GPTQ residuals, packed as int4 per-group-64 asymmetric. ~200-400 LOC; deferred to v4. train_gpt_v3.py implements (1) and exposes (2): - POLAR_EXPRESS_NS=0 default (byte-for-byte SOTA when off). - _PE_COEFFS module-level constant + _POLAR_EXPRESS_NS flag read at import time so torch.compile sees them as constants. - zeropower_via_newtonschulz5 branches on _POLAR_EXPRESS_NS to use per-iteration coefficients instead of fixed. - MIN_LR was already an env var; setting MIN_LR=0.10 at runtime opts in. Sizes: v3 raw 54,977 lzma 15,128 (+272 vs v2, +1,880 vs SOTA). Worst- seed artifact slack: ~4,888 bytes under cap. Tight but workable. AST-validated on Python 3.13 (macOS) and 3.12 (Vultr Linux). Stacking projection (single-seed): - Phase 0 baseline: 1.08038 - + LR=0.010 (Stage 2): 1.08021 - + Polar Express NS: 1.0787-1.0797 - + MIN_LR=0.10: 1.0777-1.0794 - + ConfTTT (PR openai#1879): 1.0772-1.0793 - + LQER (v4 work): 1.0742-1.0783 - + Phase 2 architecture: 1.0712-1.0773 - + Newton-Muon Stage E: 1.066-1.075 Path B (absorb-and-stack) recommended over Path A (race-to-merge-with- current-stack) since current stack alone doesn't clear 1.0760. Race awareness: openai#1874, openai#1855 (lrzip-stripped), and openai#1797 are all open. Whichever merges first becomes new SOTA and our threshold tightens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SP8192 + Order-6 Strict Full-Val Byte PPM
val_bpb = 0.96255 (3-seed mean, std 0.00047) | 15.997 MB mean artifact | 8xH100 SXM
This submission keeps the SP8192 recurrence / parallel-residual / QK-gain base stack and replaces the prior order-4 PPM setting with a strict full-validation order-6 byte-level PPM mixture at eval time. The PPM state is built online from the already-scored byte prefix, then updated only after each byte is scored.
Results
The best seed is 1337 at
0.96205812BPB. The largest observed total submission size is15,999,992bytes, still under the 16,000,000 byte cap.Method
The eval path first computes the normal sliding-window neural-network NLLs with stride 64. It then converts the scored token stream into byte contributions and mixes the NN byte probability with an order-6 byte PPM-D probability:
p_mix = lambda * p_nn + (1 - lambda) * p_ppmThe gate is binary and prefix-only. With the submitted settings, PPM is trusted more when its longest-context top-symbol confidence is at least
0.9; otherwise the NN dominates.PPM_ORDER6PPM_LAMBDA_HI0.9PPM_LAMBDA_LO0.05PPM_CONF_THRESHOLD0.9PPM_LOG_CACHE_SIZE1048576SKIP_QUANTIZED_EVAL1SLIDING_BATCH_SEQS32Order 6 was selected after full-val checks. Order 7 and order 8 were slower and worse on seed 42, so they are not part of the submitted result.
Compliance
Reproduce
Change
SEEDandRUN_IDto reproduce the other two logs.