You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- PR #1564 (joshkmartinez, **1.01710**): GDN-Hybrid (Gated DeltaNet + SWA), NO TTT/SLOT — extraordinary if verified; unreviewed
123
+
- PR #1564 (joshkmartinez, **1.01710**): CLOSED (superseded by PR #1575 by same author)
124
+
- PR #1576 (joshkmartinez, **~~1.01671~~**): GDN-Hybrid — **BPB BUG confirmed by reviewer** (space token double-count from PR #1545), actual ~1.16–1.18 BPB. Do NOT track.
125
+
- PR #1585 (codemath3000, **1.0639**): Casefold Tokenizer — **LEGALITY DEBATED** (modifying val corpus bytes); await organizer ruling
126
+
- PR #1578 (mikeapedia, **1.0668**): Custom Casefold Tokenizer — **LEGALITY DEBATED**; same concern as #1585
122
127
**Best open with SLOT**: ~1.0766 val_bpb (PR #1333, aryanbhosale, Causal SLOT-16 on PR #1334 base) — no organizer rejection
123
128
**Best open (illegal)**: 1.0632 (PR #1517, RulinShao, Pre-Quant TTT 18ep — same ruling as #1351/#1416)
124
-
**Target**: Beat 1.0810 merged SOTA by >=0.005 nats → need **≤1.0760 bpb**. Best reachable: ~1.074–1.077 (legal). With SLOT: ~1.073–1.076. **18 days to deadline (Apr 30).**
129
+
**Target**: Beat 1.0810 merged SOTA by >=0.005 nats → need **≤1.0760 bpb**. Best reachable: ~1.068–1.075 (legal). With SLOT: ~1.065–1.073. **17 days to deadline (Apr 30).**
125
130
126
131
**CRITICAL LEGALITY UPDATES**:
127
132
-**PR #771 REJECTED (2026-03-27)** — Our AdamW TTT 30ep was train-then-score. All 30-epoch TTT results void.
128
-
-**N-gram hash cache ILLEGAL** — PRs #727, #741 closed. PR #758 open but has major legality flags. PR #731 open (dense count tables + Laplace smoothing, reviewer says "LOOKS CLEAN", awaiting 3rd seed).
-**N-gram Tilt IS LEGAL (PR #1420)** — Normalized via softmax Z. **⚠️ PR #1420 has causality bug — use PR #1437's corrected implementation.**
130
135
-**Score-first TTT IS LEGAL** — ≤3ep confirmed (PR #1413). PR #1557 cites PR #1514 as precedent for 5ep — status uncertain; use ≤3ep to be safe.
131
136
-**Pre-quant TTT ILLEGAL (all variants)** — PR #1351, #1416, #1408, #1423. Do NOT use.
132
137
-**SLOT δ-vector: Issue #140 CLOSED (Apr 6), NO organizer ban** — @valerio-oai NEVER commented in Issue #140. 9 record PRs use SLOT. Risk remains. Implement only if willing to accept rejection risk.
133
138
-**ETLB UNRULED** — PR #1399/#1415; no ruling; -0.0019 bpb standalone. Await before implementing.
134
-
-**GDN-Hybrid (PR #1564)**: No legality concerns — pure architecture, no TTT/SLOT. If organizer approves, it's the new gold standard at 1.01710.
135
-
-**VarLen Attention + Doc-TTT (PR #1560)**: No legality flags — per-document masking is architectural, score-first TTT per-doc.
139
+
-**GDN-Hybrid (PR #1576)**: OPEN but **BPB calculation bug confirmed (Apr 13)** — space token double-count from parent PR #1545 inflates byte count ~14%; actual ~1.16–1.18 BPB. PR #1564 CLOSED (superseded by PR #1575 by same author). Monitor PR #1575/#1576 for bug fix/organizer response before investing.
140
+
-**VarLen Attention + Doc-TTT (PR #1560)**: No legality flags — per-document masking is architectural, score-first TTT per-doc. Still awaiting review.
136
141
-**Tap-In unigram matching (PR #1555)**: Legality UNCONFIRMED — verify before implementing (may be similar to n-gram approaches).
142
+
-**Casefold Tokenizer (PR #1578, #1585)**: LEGALITY DEBATED (Apr 13) — modifying validation corpus bytes via case normalization may constitute invalid benchmark manipulation. Await @valerio-oai ruling before implementing.
**Abandoned approaches**: Training-time static LoRA TTT (hurts), product quantization (SWA-incompatible), custom Triton kernels (poor EV — REVERTED: PR #1420 shows +10% via Triton TMA, revisit after base works), int4 without QAT (quality-destructive), eval stride=32 (time budget), AdamW TTT 30ep (illegal), n-gram hash cache (illegal), pre-quant TTT any form (illegal), Eval-Time Hash Embedding trained at inference (suspect illegal — same adapt-then-score pattern), Tap-In V6 document-local matching (await ruling), GDN-Hybrid #1576 (BPB bug — actual ~1.17 not 1.01671).
156
164
**NOTE**: Doc-Independent LoRA TTT (PR #1540, rank-96, resets per batch, score-first) is categorically DIFFERENT from abandoned LoRA TTT and appears legal — consider adopting.
61.**18 days remain. Prioritize safe incremental improvements over risky architecture rewrites.** VarLen+Doc-TTT (PR #1560 approach) is the lowest-risk path to beating the target. File that first, then consider GDN-Hybrid rewrite if approved.
326
337
327
338
_Updated: 2026-04-12 (v12.1 — merged SOTA 1.0810 (PR #1493, Apr 9); 6 new merges; GDN-Hybrid 1.01710 open; VarLen+Doc-TTT 1.07406 open; target ≤1.0760; 18 days remaining)_
339
+
### Session 12 (2026-04-13)
340
+
62.**PR #758 n-gram is effectively dead.** MatoTeziTanka (Apr 12) flagged the 7-gram cache XOR hash key includes target token — same normalization/leakage violation as PRs #727/#741. The reviewer explicitly states the neural base is ~1.10–1.15 without the cache. Stop tracking #758.
341
+
63.**GDN-Hybrid BPB bug confirmed (PR #1576).** Space token double-count inherited from PR #1545 inflates byte denominator ~14%, making 1.01671 actually ~1.16–1.18 BPB. No organizer response yet. PR #1564 was voluntarily closed (superseded by PR #1575). Extraordinary GDN-Hybrid claims are FALSE until the author provides corrected byte-counting code.
342
+
64.**Per-Layer Adaptive GPTQ (PR #1586) is the highest-EV immediate action.** dexhunter's PR achieves 1.07493 (3-seed mean, std 0.00078) by differentiating GPTQ clip_sigmas: MLP=12.0, Attn=13.0, Emb int7@15.0σ. Saves 530KB vs int8 Emb, MLR=0.026. -0.01266 nats vs merged SOTA (>2× the 0.005 threshold). No legality concerns. This is a config-level change that should be in our submission.
343
+
65.**Casefold Tokenizer legality is actively contested.** PR #1578 (1.0668) and #1585 (1.0639) apply NFKC+lowercase to the validation corpus, reducing what bytes need to be predicted. Three community members debated it; no organizer ruling as of Apr 13. The improvement is real (~-0.017 bpb) but the legality is uncertain — do NOT implement until @valerio-oai rules.
344
+
66.**Systems optimizations (PR #1584) give ~20 extra steps for free.** Fused Muon kernel + batched EMA + loader prealloc = same training budget with ~20 extra gradient steps. Pure engineering, no model changes. Worth including before next submission.
345
+
67.**arXiv:2604.06169 In-Place TTT (Apr 7) is worth reading.** Replaces TTT's generic reconstruction loss with a next-token-prediction-aligned objective, enabling chunk-wise updates compatible with score-first paradigm. Could improve legal TTT quality. Read before next TTT implementation.
346
+
68.**Merged SOTA held at 1.0810 for 4 days (Apr 9–13).** This is the longest gap since competition acceleration began. Either the field is catching up, or a wave of PRs is being prepared. Expect merges in next 2–3 days given the 8 open PRs in range.
0 commit comments