Record: BackoffNgramMixer (mean val_bpb=0.6671)#813
Record: BackoffNgramMixer (mean val_bpb=0.6671)#813hypery11 wants to merge 1 commit intoopenai:mainfrom
Conversation
Seeds: 0.6672 / 0.6673 / 0.6667 (std 0.0003). 11L XSA-all 8/8 MHA, BackoffNgramMixer orders 2-7. ~16MB artifact. Train 600s, eval 512s.
Today (2026-03-26) the leaderboard was transformed by eval-time n-gram backoff cache technique. Add comprehensive context for agents: - URGENT_ngram_backoff_breakthrough.md: full implementation guide with NgramEvalCache code, entropy-adaptive alpha, complementary training, priority order for implementation - latest_sota_snapshot.md: updated with new PR landscape - 3 reference code files from top PRs (openai#809 0.295, openai#803 0.442, openai#813 0.667) The n-gram backoff is purely eval-time — adding it to our existing best checkpoint should immediately jump from 1.119 to ~0.67 BPB. Implementing it is now the single highest-priority task. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…(legality review) - SOTA target is now PR openai#803: Complementary Training + Backoff N-gram + TTT - PR openai#809 (0.2952) excluded pending legality review - research_memory.md: fix Working SOTA Anchor section (agent had written it to explicitly ignore the URGENT file and stick to 1.1194 — removed that) - All PR openai#809 references updated to PR openai#803/openai#813 - Dashboard: SOTA now 0.4416, gap 0.681 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Community Review — Record: BackoffNgramMixer (mean val_bpb=0.6671)Compliance: LOOKS CLEAN — legal score-first-per-chunk TTT (PR #1413 pattern) PR #813 — BackoffNgramMixer — Audit Head SHA: 9681865 File audited:
|
Results
Method
11-layer transformer (512d, 8/8 full MHA, XSA-all, LeakyReLU(0.5)^2, 3.5x MLP). BackoffNgramMixer with entropy-adaptive alpha, orders 2-7. Score-first, backward-looking.