You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- PR #1523 (EthanYangTW, 1.0778): Triple Recurrence + Banking + Fused MLP + Muon 0.97 — ⚠️ Eval-Time Hash Embedding may be flagged; PR #1514 (dexhunter, 1.07983) is cleaner
121
+
- PR #1564 (joshkmartinez, **1.01710**): GDN-Hybrid (Gated DeltaNet + SWA), NO TTT/SLOT — extraordinary if verified; unreviewed
120
122
**Best open with SLOT**: ~1.0766 val_bpb (PR #1333, aryanbhosale, Causal SLOT-16 on PR #1334 base) — no organizer rejection
121
123
**Best open (illegal)**: 1.0632 (PR #1517, RulinShao, Pre-Quant TTT 18ep — same ruling as #1351/#1416)
122
-
**Target**: Beat 1.0810 merged SOTA by >=0.005 nats → need **≤1.0760 bpb**. Best reachable: ~1.074–1.077 (legal stack). With SLOT: ~1.073–1.076.
124
+
**Target**: Beat 1.0810 merged SOTA by >=0.005 nats → need **≤1.0760 bpb**. Best reachable: ~1.074–1.077 (legal). With SLOT: ~1.073–1.076.**18 days to deadline (Apr 30).**
123
125
124
126
**CRITICAL LEGALITY UPDATES**:
125
-
-**PR #771 REJECTED (2026-03-27)** — Our AdamW TTT 30ep was train-then-score, not score-first. All 30-epoch TTT results are void.
126
-
-**N-gram hash cache ILLEGAL** — PRs #727, #741 closed. PRs #731, #758 still open but unresolved.
127
-
-**N-gram Tilt IS LEGAL (PR #1420)** — Normalized via softmax partition function Z: `p_tilt(t) = p_model(t) · exp(β · 1[t==hint]) / Z`. Causal (backward-looking only). -0.0029 bpb, zero artifact cost. **⚠️ PR #1420's kernel has a causality bug — use PR #1437's corrected implementation.**
128
-
-**PR #1423 ILLEGAL (2026-04-07)** — Pre-quant TTT, same ruling as #1351/#1408/#1416.
129
-
-**Score-first TTT ≤3 epochs IS LEGAL** — PR #1413: all blocks, lr=0.005, 3ep. -0.003 bpb.
130
-
-**Pre-quant TTT ILLEGAL (all variants)** — PR #1351, #1416, #1408. Do NOT use.
131
-
-**SLOT δ-vector: Issue #140 CLOSED (Apr 6), NO organizer ban** — @valerio-oai NEVER commented in Issue #140. 9 record PRs use SLOT variants without rejection. @abaybektursun self-removed (causality concern) but no official rule. Causal SLOT-16 (PR #1333, 1.0766 BPB) is the current best open record claim. Scored-position SLOT (PR #1229) reached 0.9300 BPB. **RISK: causality concern unresolved; @valerio-oai could rule at any time on PRs. Implement only if willing to accept rejection risk.**
127
+
-**PR #771 REJECTED (2026-03-27)** — Our AdamW TTT 30ep was train-then-score. All 30-epoch TTT results void.
128
+
-**N-gram hash cache ILLEGAL** — PRs #727, #741 closed. PR #758 open but has major legality flags. PR #731 open (dense count tables + Laplace smoothing, reviewer says "LOOKS CLEAN", awaiting 3rd seed).
129
+
-**N-gram Tilt IS LEGAL (PR #1420)** — Normalized via softmax Z. **⚠️ PR #1420 has causality bug — use PR #1437's corrected implementation.**
130
+
-**Score-first TTT IS LEGAL** — ≤3ep confirmed (PR #1413). PR #1557 cites PR #1514 as precedent for 5ep — status uncertain; use ≤3ep to be safe.
131
+
-**Pre-quant TTT ILLEGAL (all variants)** — PR #1351, #1416, #1408, #1423. Do NOT use.
132
+
-**SLOT δ-vector: Issue #140 CLOSED (Apr 6), NO organizer ban** — @valerio-oai NEVER commented in Issue #140. 9 record PRs use SLOT. Risk remains. Implement only if willing to accept rejection risk.
132
133
-**ETLB UNRULED** — PR #1399/#1415; no ruling; -0.0019 bpb standalone. Await before implementing.
**Abandoned approaches**: Training-time static LoRA TTT (hurts), product quantization (SWA-incompatible), custom Triton kernels (poor EV — REVERTED: PR #1420 shows +10% via Triton TMA, revisit after base works), int4 without QAT (quality-destructive), eval stride=32 (time budget), AdamW TTT 30ep (illegal), n-gram hash cache (illegal), pre-quant TTT any form (illegal), Eval-Time Hash Embedding trained at inference (suspect illegal — same adapt-then-score pattern), Tap-In V6 document-local matching (await ruling).
150
156
**NOTE**: Doc-Independent LoRA TTT (PR #1540, rank-96, resets per batch, score-first) is categorically DIFFERENT from abandoned LoRA TTT and appears legal — consider adopting.
|**Standard SLOT δ-vector (arXiv:2505.12392)**|**-0.021**|**DE FACTO IN USE — Issue #140 CLOSED (Apr 6); 9 record PRs use SLOT variants; no organizer rejection. @valerio-oai never ruled in #140. @abaybektursun self-removed (causality concern) but no ban.**|
160
-
|**Causal SLOT-16 (scored-position delta only)**|**-0.009**|**DE FACTO IN USE — PR #1333 (aryanbhosale, 1.0766 BPB, open record); PR #1229 (scored-position SLOT, 0.9300 BPB). No organizer rejection.**|
|**Standard SLOT δ-vector (arXiv:2505.12392)**|**-0.021**|**DE FACTO IN USE — Issue #140 CLOSED (Apr 6); 9 record PRs use SLOT variants; no organizer rejection**|
166
+
|**Causal SLOT-16 (scored-position delta only)**|**-0.009**|**DE FACTO IN USE — PR #1333 (aryanbhosale, 1.0766 BPB, open record); PR #1229 (0.9300 BPB). No organizer rejection.**|
161
167
|**Scored-Position SLOT (PR #1229)**|**~-0.18 vs base**|**Extraordinary — 0.9300 BPB; no organizer rejection; causality concern still present**|
162
-
|**ETLB (Eval-Time Logit Bias)**|**-0.0019**|**UNRULED — PR #1399/#1415; no ruling from @valerio-oai; await before implementing**|
168
+
|**ETLB (Eval-Time Logit Bias)**|**-0.0019**|**UNRULED — PR #1399/#1415; await before implementing**|
163
169
|**N-gram Tilt (PR #1437 kernel)**|**-0.0029**|**LEGAL — properly normalized via Z; causal; zero artifact cost. PR #1420 has causality bug — use PR #1437**|
164
-
|**Triple Loop (3× depth recurrence)**|**~-0.009 vs 2×**|**PRIMARY — PR #1420 (1.08014); 17 virtual layers; activate at 0.35× training**|
165
-
|**SP8192 vocab**|**~-0.009 vs SP4096**|**PRIMARY — PR #1420/#1413; use over SP4096**|
@@ -309,3 +314,14 @@ Every change must answer: "Does this lower val_bpb within the 16MB/10-min constr
309
314
53.**MATRIX_LR = 0.03 pairs with Muon momentum 0.97.** Both PRs #1541 and #1523 co-tune these. When reducing momentum from 0.99 → 0.97, also reduce MATRIX_LR. Check whether our base config uses 0.03 or 0.05.
310
315
311
316
_Updated: 2026-04-11 (v11.5 — PR #1541 bigbag 1.07785 + PR #1540 aryanbhosale 1.0777 new open PRs; doc-independent LoRA TTT appears legal; PR #1545 BPB bug; MATRIX_LR 0.03 pairs with momentum 0.97; no merged SOTA change)_
317
+
### Session 11 (2026-04-12)
318
+
54.**Merged SOTA jumped from 1.1147 to 1.0810 in 5 days.** Six PRs merged between Apr 4–9 (PRs #1334, #1285, #1394, #1412, #1413, #1477, #1493). The competition accelerated dramatically. Check leaderboard every session before planning — yesterday's target may already be beaten.
319
+
55.**The merged SOTA stack is now fully defined: SP8192 + Triple Recurrence + Parallel Residuals + QK-Gain 5.25 + GPTQ Emb (int8) + SDClip + WD=0.095 + EMA 0.9965 + Legal TTT.** PR #1493 (bigbag) at 1.0810. Any new submission must beat this cleanly. Target: ≤1.0760.
320
+
56.**VarLen Attention (per-document masking) is the next clear win.** PR #1560 (dexhunter) achieves 1.07406 BPB by adding per-document causal masking + Doc-TTT (per-document score-first LoRA TTT, chunk=48) on top of the PR #1413 stack. -0.009 bpb vs merged SOTA. Implement this next.
321
+
57.**GDN-Hybrid (PR #1564) at 1.01710 BPB is extraordinary — watch closely.** Gated DeltaNet + SWA architecture, no TTT/SLOT, SP1024. If organizers approve, this represents a ~0.064 bpb architectural leap with no eval-time techniques. Do not implement until organizer review; replicate if approved.
322
+
58.**TMA Megakernel (Triton Hopper) gives +200 training steps.** PR #1555 shows +10.5% throughput on H100 via TMA-fused MLP kernel. Worth implementing after VarLen+Doc-TTT is verified. Combined with Tap-In (min_match=1, 21% activation), PR #1555 reaches 1.07636.
323
+
59.**Do NOT implement Tap-In before verifying legality.** "Tap-In Unigram Matching" from PR #1555 activates at 21% of positions vs 1.7% at min_match=3. Mechanism involves token-level unigram cache — may be similar to n-gram approaches. Verify it's properly normalized before GPU spend.
324
+
60.**PR #731 n-gram is now looking clean.** Dense count tables + Laplace smoothing (not hash caches). Reviewer said "LOOKS CLEAN" — waiting on seeds 1337 and 2024 to confirm 1.0400 BPB. If merged, this gives a legal n-gram mixer alternative.
325
+
61.**18 days remain. Prioritize safe incremental improvements over risky architecture rewrites.** VarLen+Doc-TTT (PR #1560 approach) is the lowest-risk path to beating the target. File that first, then consider GDN-Hybrid rewrite if approved.
326
+
327
+
_Updated: 2026-04-12 (v12.1 — merged SOTA 1.0810 (PR #1493, Apr 9); 6 new merges; GDN-Hybrid 1.01710 open; VarLen+Doc-TTT 1.07406 open; target ≤1.0760; 18 days remaining)_
0 commit comments