Record: LongCtx No-QV QK5.25 + AsymLogit + LQER g32/top4 + TTT-local 0.80 — 1.05792 BPB 3-seed mean by S0urC10ud · Pull Request #2060 · openai/parameter-golf

S0urC10ud · 2026-05-01T00:04:34Z

Summary

This PR adds a 10min/16MB record based on a five-knob hyperparameter retune of PR #2007's LongCtx No-QV QK5.25 + AsymLogit configuration.
The submission keeps the parent architecture, optimizer, dataset, tokenizer, TTT/eval pipeline, quantizer, and compressor byte-for-byte unchanged — train_gpt.py is byte-identical to #2007 (md5 2a7e36e29aa5b5811abb6170059aa8d1). Only five env-var scalars are retuned.

Record folder:

records/track_10min_16mb/2026-05-01_LongCtx_NoQV_QK525_AsymLogit_LQERg32top4_TTTlocal080_1.0579/

Results

Seed	Stop step	Train time	Final TTT BPB	Artifact bytes
42	4868	596.142 s	1.05781454	15,971,753
0	4861	595.821 s	1.05798212	15,971,492
1234	4873	595.991 s	1.05796494	15,971,748
Mean			1.05792053
Std			0.00007528

All seeds satisfy the 10-minute / 16 MB rules:

train_wallclock_s ≤ 600 ✓ (595.8 – 596.1 s)
TTT phased eval_time_s ≤ 600 ✓ (395.4 – 397.6 s)
Total submission size ≤ 16,000,000 B ✓ (15,971,492 – 15,971,753 B)
Artifact slack ≥ 28,247 B on every seed

What changed vs parent #2007

Five env-var deltas only:

Knob	Parent #2007	This PR	Direction
`MATRIX_LR`	0.026	0.028	slightly higher matrix LR
`LQER_RANK`	4	2	half-rank LQER correctors
`LQER_ASYM_GROUP`	64	32	finer asym-quant groups
`LQER_TOP_K`	3	4	one extra top-K corrector slot
`TTT_LOCAL_LR_MULT`	0.75	0.80	slightly hotter local TTT step

Comparison vs parent #2007 (paired, same 3 seeds)

Seed	Parent #2007 BPB	This PR BPB	Δ BPB
42	1.05857451	1.05781454	−0.00076
0	1.05915199	1.05798212	−0.00117
1234	1.05924929	1.05796494	−0.00128
Mean	1.05899193	1.05792053	−0.00107

Paired one-sided t-test: mean Δ_loss = −0.00234 nats, t = −6.73, p ≈ 0.011.

Comparison vs currently-merged SOTA #1493 (1.0810)

Δ_BPB ≈ −0.0231, Δ_nats ≈ −0.051. Every individual seed improves by ≥ 0.022 BPB, far above the 0.005-nat record threshold.

Method

The frozen parent recipe (unchanged here):

CaseOps/SP8192 tokenization with byte-sidecar BPB accounting.
Sparse attention gating, BOS-fixed SmearGate, skip gates, LQER correction, int7 embeddings, and mixed-precision GPTQ + AWQ-lite.
2560-token eval and TTT windows.
No-QV TTT masking, keeping K/O/MLP adaptation active.
TTT_LORA_RANK=80, PHASED_TTT_PREFIX_DOCS=3000.
QK_GAIN_INIT=5.25, WARMDOWN_FRAC=0.85, MIN_LR=0.1.
Eval-only asymmetric logit rescale.
Per-group lrzip -L 9 compression.

The five-knob retune was chosen by an MN5 single-node sweep on top of the #2007 parent stack.

Reproduction

Prepare the CaseOps dataset once:

python prepare_caseops_data.py --local-dir /workspace/caseops_data

Run a seed from this folder:

SEED=42 \
CASEOPS_ROOT=/workspace/caseops_data \
RUN_ID=longctx_noqv_qk525_asym_lqer_g32_top4_tttlocal080_seed42 \
./run_current_candidate.sh

The script sets the full environment and runs:

torchrun --standalone --nproc_per_node=8 train_gpt.py

Repeat with SEED=0 and SEED=1234 for the matched 3-seed validation.

Logs

train_seed42.log — final BPB 1.05781454
train_seed0.log — final BPB 1.05798212
train_seed1234.log — final BPB 1.05796494

Hardware / software

8 × NVIDIA H100 80GB HBM3 SXM (RunPod)
PyTorch 2.9.1+cu128, CUDA 12.8
Same FlashAttention / Triton / runtime stack as parent Record: LongCtx No-QV QK5.25 + AsymLogit — 1.05899 BPB 3-seed mean #2007 record.

3-seed mean val_bpb 1.05769 (std 0.00041) on 8xH100 80GB SXM. Forks PR openai#2007 (1.0590); env-only retune of MATRIX_LR=0.028, LQER_RANK=2, LQER_TOP_K=4, LQER_ASYM_GROUP=32, TTT_LOCAL_LR_MULT=0.80. train_gpt.py byte-identical to parent. Improves merged SOTA openai#1493 (1.0810) by ~0.023 BPB / ~0.051 nats; paired vs openai#2007 yields p=0.002 but only 0.00293-nat magnitude (below 0.005-nat bar) so non-record vs openai#2007.

Corrected seed-42 Final TTT BPB: 1.05781454 (was 1.05711454). 3-seed mean: 1.05792053, std: 0.00007528. Directory renamed 1.0577 -> 1.0579 to match corrected mean.

Recent sweep logs (named): S55: token-only ngram tilt baseline = 1.05814 (legal per PR openai#1514) S56: + 3 openai#2060 levers = 1.05790 (-0.00024) S57: + AsymLogit only = 1.05759 (-0.00055) S58: full stack = 1.05694 single seed (-0.00120, super-additive +0.00041 synergy) S59: S58 + EVAL_SEQ_LEN=3072 + NUM_PHASES=1 + WD=1.0 = 1.05657 single seed, eval 567s S60 OOM: S59 + EMA_DECAY=0.9 + batch=64 = OOM S60 retry: S58 + EMA_DECAY=0.9 + batch=32 = 1.05795 / 832s NON-COMPLIANT S61: S59 + TOKEN_BOOST=3.0 = 1.05678 single seed, eval 501s S62: S58 + NUM_PHASES=2 + WD=2.0 + eval=2816 = 1.05755 Earlier sweep logs (UUID-named): ~83 files covering S15-S54 sprint history. Key findings: - AsymLogit Rescale: 2 trainable scalars (softcap_pos, softcap_neg) give -0.00055 via global TTT polish - Token-only n-gram tilt confirmed legal per PR openai#1514 (within_tau=99, word_tau=99, agree=0) - 3 openai#2060 env-var levers (MATRIX_LR=0.028, LQER_ASYM_GROUP=32, TTT_LORA_LR=8e-5) stack super-additively - EMA_DECAY=0.9 didn't transfer to our base - NUM_PHASES=2 revert costs more pre-quant than it gains in TTT recovery - Discovered val_tokens=47852544 vs canonical 47853343, need EVAL_INCLUDE_TAIL=1 for clean comparison Added .gitignore for final_model.pt (130MB - over GitHub limit), .so binaries, pid files.

Beats PR openai#1855 (merged rank 1, 1.06108) by 0.00438 BPB. Beats PR openai#2014 (best open, 1.05759) by 0.00089 BPB. Beats PR openai#2060 (1.05792) by 0.00122 BPB. Stack: - Token-only n-gram tilt (PR openai#1514 merged precedent, within/word channels disabled) - AsymLogit Rescale (2 trainable scalars adapted by global TTT) - 3 hyperparameter levers from PR openai#2060 (MATRIX_LR=0.028, LQER_ASYM_GROUP=32, TTT_LORA_LR=8e-5) - PHASED_TTT_NUM_PHASES=1 (matches PR openai#2014) - NGRAM_HINT_PRECOMPUTE_OUTSIDE=0 (precompute INSIDE eval timer per PR openai#1514) Compliance: - All seeds eval ≤533.1s (cap 600s, 67-80s margin) - All artifacts ≤15.95MB (cap 16MB) - Token-only n-gram channel (within_gate=0, word_gate=0) - Score-first TTT (per PR openai#402)

martindallinger-nxai added 2 commits May 1, 2026 01:42

Add SP8192 NoQV LQER g32/top4 TTT-local 0.80 record (val_bpb 1.0579)

ef619c2

Corrected seed-42 Final TTT BPB: 1.05781454 (was 1.05711454). 3-seed mean: 1.05792053, std: 0.00007528. Directory renamed 1.0577 -> 1.0579 to match corrected mean.

OnlyJundong mentioned this pull request May 1, 2026

record: AWQ-lite + AsymLogit + GradCentr + LabSmooth - 1.05846 BPB #2097

Closed

someone114514 mentioned this pull request May 1, 2026

Follow-up: LongCtx No-QV Prefix3500 GlobalTTT LR 0.0008 — 1.05807 BPB seed42 #2100

Open

OnlyJundong mentioned this pull request May 1, 2026

Record: PR #1855 + AWQ-lite + AsymLogit + GradCentral val_bpb=1.05845 #2101

Open

himanshudongre mentioned this pull request May 1, 2026

Non-record: competition research notes #2111

Open

This was referenced May 1, 2026

Record: CaseOps Gated XSA NgramTilt LQER | val_bpb=1.05933439 #2123

Closed

Record : CaseOps Gated XSA NgramTilt LQER | val_bpb=1.05933439 #2124

Open

Record : CaseOps Gated XSA NgramTilt LQER | val_bpb=1.05933439 vaibhavmishra1/parameter-golf#1

Merged

leon2k2k2k mentioned this pull request May 1, 2026

Train/val data leakage in CaseOps records — prepare_caseops_data.py default overlaps 80% of val docs with training data #2127

Open

TanishGudise mentioned this pull request May 1, 2026

Record candidate: 1.05670 BPB — token-only n-gram tilt + AsymLogit + #2060 levers + NUM_PHASES=1 #2130

Open

codemath3000 mentioned this pull request May 2, 2026

Record candidate: PR #2130 base + GPTQ_CALIBRATION_BATCHES=32 — val_bpb 1.05651 (3-seed mean) #2135

Open

cocohearts mentioned this pull request May 2, 2026

Update leaderboard with May 1 audited rows #2146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: LongCtx No-QV QK5.25 + AsymLogit + LQER g32/top4 + TTT-local 0.80 — 1.05792 BPB 3-seed mean#2060

Record: LongCtx No-QV QK5.25 + AsymLogit + LQER g32/top4 + TTT-local 0.80 — 1.05792 BPB 3-seed mean#2060
S0urC10ud wants to merge 2 commits intoopenai:mainfrom
S0urC10ud:submission/noqv-lqer-g32-top4-tttlocal080-v2

S0urC10ud commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

S0urC10ud commented May 1, 2026

Summary

Results

What changed vs parent #2007

Comparison vs parent #2007 (paired, same 3 seeds)

Comparison vs currently-merged SOTA #1493 (1.0810)

Method

Reproduction

Logs

Hardware / software

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants