You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
_Updated: 2026-04-22 (v16.0 — Merged SOTA 1.0810 Day 13 plateau; **CRITICAL: bigbag filed CaseOps PR #1771 at 1.06513 — strongest signal CaseOps will pass**; dexhunter PR #1769 at 1.06453 (new best); LoRA-TTT warm-start A + alpha=144 + WD=1.0 emerging as legal TTT improvement; arXiv:2604.15259 looped transformer stability paper — outer normalization enables deeper loops; 8 days to deadline)_
469
+
470
+
---
471
+
472
+
# Parameter Golf Daily Research - 2026-04-23
473
+
474
+
## PR #771 STATUS: CLOSED (ILLEGAL — no change)
475
+
476
+
Rejected by @valerio-oai 2026-03-27. Train-then-score AdamW TTT 30ep on val tokens. No new comments.
|#758| 1.0465 |**OPEN (effectively dead)**| XOR hash key includes target token; MatoTeziTanka flagged Apr 12. Recommendation: close under same ruling as #727 family. No author response. |
486
+
|#731| 1.0400 |**OPEN — awaiting seeds 1337 + 2024**| "LOOKS CLEAN" review. Dense count + Laplace, score-first per chunk. No movement. **7 days to deadline — seed confirmation highly unlikely; treat PR as abandoned.**|
**DAY 14 PLATEAU** — confirmed via `git log upstream/main`. Last merge was PR #1511 (automated leaderboard README update), last true record merge was PR #1493 on Apr 9. **7 days to deadline (Apr 30).** Longest plateau in competition history.
genji0306 directly refuted the BPB double-count concern with code comparison, showing the `▁` stripping + boundary credit is applied exactly once. Artifact 15.88 MB (clean). No TTT, no SLOT, no n-gram — pure architecture. If BPB is genuinely correct this is the biggest non-casefold open PR. But every prior GDN BPB bug was also "denied" by authors before dexhunter proved the bug. **Wait for organizer or dexhunter independent verification before investing.**
531
+
532
+
### New legal techniques from PR #1787 (CaseOps-independent)
533
+
534
+
**Polar Express Newton-Schulz** (applies to all runs):
535
+
- Replaces fixed Muon NS coefficients `[(9.0/8.0, -7.0/8.0), (9.0/8.0, -7.0/8.0), ...]` with 5 distinct per-iteration tuned tuples in `zeropower_via_newtonschulz5`
536
+
- Better approximation to the exact Newton-Schulz iteration (each step uses optimal coefficients for that convergence phase)
537
+
- Zero artifact size change, ~3 lines. Appears fully legal. nprime06 attributes +0.00171 BPB improvement to this combined with MIN_LR.
538
+
539
+
**MIN_LR warmdown floor** (applies to all runs):
540
+
- Sets LR floor during warmdown to `0.1 × peak_LR` instead of zero
541
+
- Enables productive gradient updates during the final ~25% of training
542
+
- Zero artifact size change, 1 line. Fully legal.
543
+
544
+
Both techniques are CaseOps-independent and should be considered for our stack.
545
+
546
+
### PR #1790 — new clean reference point for legal stack
- PR #1667 (AttnOutGate w=24 + SmearGate) stacks with
550
+
- LoRA-TTT alpha=144 + warm-start A + WD=1.0 (from PR #1767/1771) stacks with
551
+
- Phased global SGD TTT (PR #1700 style)
552
+
553
+
...to reach **1.06991 without CaseOps**. This is the new floor for "legal no-CaseOps" stack. Our planned #1586+#1667+TTT improvements should reach ~1.065–1.068 if we add the per-layer GPTQ (#1586) that miaoyuxun does not appear to include.
554
+
555
+
### Issue #1604 (CaseOps ruling)
556
+
557
+
**STILL OPEN. No @valerio-oai comment as of Apr 23.** Issue has been open 10 days. Self-imposed deadline is **tomorrow, Apr 24**. Begin clean legal stack implementation immediately regardless of ruling outcome.
558
+
559
+
---
560
+
561
+
## New Research Papers
562
+
563
+
### arXiv:2604.11791 — A Mechanistic Analysis of Looped Reasoning Language Models (Apr 2026) ★ NEW
564
+
565
+
Key finding: each transformer layer in a recurrent cycle converges to a distinct fixed point; the recurrent block follows a consistent cyclic trajectory in latent space.
566
+
567
+
**Relevance to Parameter Golf**: Confirms that our Triple Loop (layers 4-5 × 3) should learn distinct representations per iteration rather than collapsing. The cyclic trajectory finding is consistent with arXiv:2604.15259's "recall" mechanism that enables stable outer normalization. Together these papers provide strong theoretical backing for our architecture — the cyclic trajectory IS stable if outer normalization is added. Implementation: add RMSNorm at each loop output (~1–3 lines per iteration).
568
+
569
+
### Already-tracked papers with new competition confirmations
570
+
571
+
-**arXiv:2511.07384** (Retrofitted Recurrence Curriculum): PR #1756 and PR #1771 both implement. Now confirmed viable by bigbag.
572
+
-**arXiv:2505.06708** (Gated Attention, NeurIPS 2025): PR #1667 and PR #1790 both use. Confirmed by two independent authors.
573
+
-**arXiv:2604.12946** (Parcae): No competition PR yet. Still unimplemented in the competition field.
574
+
-**arXiv:2604.15259** (Outer normalization for stable loops): No competition PR yet. 1–3 line implementation opportunity.
575
+
576
+
### No new transformative papers from Apr 22–23
577
+
578
+
TTT paper searches returned only pre-existing work (arXiv:2512.23675 E2E-TTT, arXiv:2505.23884 LaCT). Quantization searches returned no new competition-relevant techniques beyond what is tracked. Field quiet on Apr 22–23.
579
+
580
+
---
581
+
582
+
## HuggingFace / Community Discoveries
583
+
584
+
-**PR #1790 (miaoyuxun)** is the clearest evidence that our planned stack works: someone else has already combined #1667+improved-TTT and hit 1.06991. Our version (adding #1586 per-layer GPTQ) should go lower.
585
+
-**Polar Express NS is new community technique** appearing in PR #1787. First PR to tune per-iteration NS coefficients independently. If it contributes even -0.001 bpb standalone it's worth the 3-line change.
586
+
-**MIN_LR warmdown floor** also new in PR #1787. The "don't decay to zero" insight is simple and has precedent in optimizer literature (warm restart cycles). Worth testing.
587
+
-**GDN FLA field**: PR #1791 is the first GDN PR to actively refute the BPB bug claim with code. If organizers confirm it's clean, the GDN architecture becomes live again. Three previous GDN PRs (#1576, #1687, #1698) all had genuine bugs — author denials did not hold up.
588
+
-**PR #731 (Hedge Mixer)**: Dead. 7 days to deadline, no seed updates since April 12. Author likely has no GPU access.
589
+
590
+
---
591
+
592
+
## Recommended Actions (priority order)
593
+
594
+
1.**IMPLEMENT #1586+#1667+LoRA-TTT improvements TODAY** — 7 days to deadline. This is day 7 of this being the top action. The combination is now externally validated by PR #1790 (miaoyuxun, 1.06991). Add per-layer GPTQ (#1586) on top: expected target ~1.065–1.068. Need 3 seeds for a valid submission.
595
+
596
+
2.**ADD Polar Express NS + MIN_LR floor** (from PR #1787, CaseOps-independent) — these are 1–4 line changes with zero legality risk. Include in the same run as action 1.
597
+
598
+
3.**ADD VarLen Attention + Doc-TTT (PR #1560)** in the following run. ~-0.007 bpb. Per-document causal masking + score-first LoRA TTT per-doc (chunk=48).
599
+
600
+
4.**Issue #1604 deadline is TOMORROW (Apr 24)** — if no @valerio-oai ruling, proceed without CaseOps. If ruled legal, add bijective CaseOps from PR #1769 (dexhunter's clean implementation) — target drops to ~1.063.
601
+
602
+
5.**Monitor PR #1791 (GDN FLA, 1.0339)** for organizer response. If BPB confirmed clean, this is a massive architectural shift worth pursuing — but do NOT start implementation until independently verified.
603
+
604
+
6.**DO NOT IMPLEMENT**: Pre-quant TTT (#1758/#1735), SLOT, any GDN without organizer BPB verification, PR #1785 PPM mixture (multiple concerns pending ruling).
605
+
606
+
---
607
+
608
+
_Updated: 2026-04-23 (v17.0 — Merged SOTA 1.0810 Day 14 plateau confirmed (git log); PR #1790 miaoyuxun 1.06991 new best legal no-CaseOps (validates #1667+TTT stack); PR #1791 genji0306 GDN FLA 1.0339 author refuted BPB bug — await organizer; PR #1785 OE-GOD PPM 1.01925 unverified (5 dexhunter concerns); Polar Express NS + MIN_LR floor new legal techniques from PR #1787; Issue #1604 deadline tomorrow Apr 24; 7 days to deadline)_
0 commit comments