Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT + CaseOps Tokenizer — val_bpb 1.07462 (3-seed mean)#1755
Open
OE-GOD wants to merge 1 commit intoopenai:mainfrom
Conversation
…25 + Legal TTT + CaseOps Tokenizer — val_bpb 1.07462 (3-seed mean) 3-seed mean val_bpb: 1.07462 (std 0.00043) on 8×H100 SXM. Delta vs merged SOTA PR openai#1493 (1.08100): -0.00638 BPB = -0.01402 nats/token. z≈22.8, p ≪ 0.0001. Clears p<0.01 significance threshold comfortably. Combines: - PR openai#1493 @bigbag: merged base stack (legal score-first TTT) - PR openai#1729 @romeerp: lossless CaseOps tokenizer + byte sidecar (pending) Deliberately excludes PR openai#1735/openai#1738's pre-quant TTT, which is a Condition-3 violation per the @MatoTeziTanka / @dexhunter community review on PR openai#1416. All 4 conditions from Issue openai#1017 satisfied. Max artifact: 15,991,629 bytes (under 16,000,000 decimal limit on all 3 seeds).
sunnypatneedi
pushed a commit
to sunnypatneedi/parameter-golf
that referenced
this pull request
Apr 21, 2026
…uant TTT; Recurrence Depth Curriculum; Parcae stable loops - SOTA 1.0810 still holds (Day 12 plateau, longest in competition history) - PR openai#1758 (1.02840): pre-quant TTT — 6th attempt at illegal pattern, ignore - PR openai#1756 (1.06505): CaseOps + Recurrence Depth Curriculum (depth 1→3→4); has BOS bug; awaits Issue openai#1604 - PR openai#1755 (1.07462): CaseOps + Legal TTT; awaits Issue openai#1604 - New paper: Parcae (arXiv:2604.12946) — stable looped LMs via spectral norm constraint on injection params, relevant to Triple Loop stability - New paper: Gated Attention (arXiv:2505.06708, NeurIPS 2025) — backs PR openai#1667 Attention Output Gate - Added Session 18 lessons learned; Issue openai#1604 self-deadline Apr 24 - Primary action: implement PR openai#1586+openai#1667 immediately (9 days to deadline) https://claude.ai/code/session_0151v7YeWWUSnhmcC8U8NGUV
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Record Summary
val_bpb = 1.07462 (3-seed mean, std 0.00043) | 8×H100 SXM | max artifact 15,991,629 bytes (< 16,000,000)
3-Seed Results (packed submission)
Delta vs merged SOTA (PR #1493)
Contribution
Combines two previously-separate directions without inheriting the pre-quant TTT component that is a Condition-3 violation:
Legal score-first TTT — merged stack from PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493 (@bigbag): SP8192 + 3-layer depth recurrence + parallel residuals + QK-Gain 5.25 + SGD TTT with score-before-update ordering.
Lossless CaseOps tokenizer with byte sidecar — pending PR Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean) #1729 (@romeerp): bijective case-folding (TITLE / ALLCAPS / CAPNEXT reserved tokens) +
fineweb_val_bytes_*.binsidecar reporting original UTF-8 byte counts per token.Specific novel work in this PR (~25 lines):
eval_val,eval_val_sliding,eval_val_ttt).load_validation_tokensglob to exclude_bytes_*.binfiles (prevents double-counting the token stream).Compliance (Issue #1017 — all 4 conditions)
flash_attn_3_func(..., causal=True); sliding-window eval uses strict prefix; byte sidecar is pre-computed data (shipped asfineweb_val_bytes_*.bin), not runtime state from val tokens.30·tanh(x/30)uniform across all logits.torch.no_grad()before any parameter update. No pre-quant TTT, no SLOT, no score-after-adapt.Additional:
lossless_caps.pyin PR Record: CaseOps Tokenizer + Tapered WD - val_bpb 1.0678 (3-seed mean) #1729).Attribution
Full details and reproduction instructions in
records/track_10min_16mb/2026-04-20_SP8192_3LayerRecur_ParResid_QK525_LegalTTT_CaseOps/README.md.