You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- PR #1576 (joshkmartinez, **~~1.01671~~**): GDN-Hybrid — **BPB BUG confirmed by reviewer** (space token double-count from PR #1545), actual ~1.16–1.18 BPB. Do NOT track.
125
127
- PR #1585 (codemath3000, **1.0639**): Casefold Tokenizer — **LEGALITY DEBATED** (modifying val corpus bytes); await organizer ruling
126
128
- PR #1578 (mikeapedia, **1.0668**): Custom Casefold Tokenizer — **LEGALITY DEBATED**; same concern as #1585
127
-
**Best open with SLOT**: ~1.0766 val_bpb (PR #1333, aryanbhosale, Causal SLOT-16 on PR #1334 base) — no organizer rejection
129
+
- PR #1647 (powerpratik, **1.0616**): SLOT-4 + TTT + 3-Layer Recurrence + Parallel Residuals — ⚠️ standard SLOT, no reviews
130
+
**Best open with SLOT**: ~1.0616 val_bpb (PR #1647, powerpratik, SLOT-4) — no reviews yet
128
131
**Best open (illegal)**: 1.0632 (PR #1517, RulinShao, Pre-Quant TTT 18ep — same ruling as #1351/#1416)
129
-
**Target**: Beat 1.0810 merged SOTA by >=0.005 nats → need **≤1.0760 bpb**. Best reachable: ~1.068–1.075 (legal). With SLOT: ~1.065–1.073.**17 days to deadline (Apr 30).**
132
+
**Target**: Beat 1.0810 merged SOTA by >=0.005 nats → need **≤1.0760 bpb**. Best reachable: ~1.068–1.072 (legal stack #1586+#1667+#1560). With casefold if ruled legal: ~1.059.**14 days to deadline (Apr 30).**
130
133
131
134
**CRITICAL LEGALITY UPDATES**:
132
135
-**PR #771 REJECTED (2026-03-27)** — Our AdamW TTT 30ep was train-then-score. All 30-epoch TTT results void.
**Abandoned approaches**: Training-time static LoRA TTT (hurts), product quantization (SWA-incompatible), custom Triton kernels (poor EV — REVERTED: PR #1420 shows +10% via Triton TMA, revisit after base works), int4 without QAT (quality-destructive), eval stride=32 (time budget), AdamW TTT 30ep (illegal), n-gram hash cache (illegal), pre-quant TTT any form (illegal), Eval-Time Hash Embedding trained at inference (suspect illegal — same adapt-then-score pattern), Tap-In V6 document-local matching (await ruling), GDN-Hybrid #1576 (BPB bug — actual ~1.17 not 1.01671).
164
168
**NOTE**: Doc-Independent LoRA TTT (PR #1540, rank-96, resets per batch, score-first) is categorically DIFFERENT from abandoned LoRA TTT and appears legal — consider adopting.
@@ -363,3 +368,13 @@ _Updated: 2026-04-14 (v12.3 — merged SOTA 1.0810 Day 5 no change; PR #1610 Pha
363
368
77.**No new open PRs filed Apr 14–15 with competitive scores.** Web search and git log show nothing new. PR #1619 (likely illegal AdamW TTT) and PR #1616 (QK-Gain 5.5) are low-interest. The competitive field is in a holding pattern — same 8 PRs as yesterday.
364
369
365
370
_Updated: 2026-04-15 (v12.4 — merged SOTA 1.0810 Day 6 no change; Newton-Muon arXiv:2604.01472 added (+6% effective steps, verify vs MuonEq-R); In-Place TTT (2604.06169) NTP-aligned loss distinguishes it from Session 3 failure; 15 days remaining)_
371
+
372
+
### Session 15 (2026-04-16)
373
+
78.**Merged SOTA 1.0810 — Day 7 plateau, longest in competition history.** Seven days since last merge (Apr 9). With 14 days to deadline, the field appears to be preparing a late push. Do not take the plateau as stability — a wave of merges is likely imminent given 8+ open PRs in the 1.062–1.078 range.
374
+
79.**PR #1667 (MarioPaerle, 1.07139) is a new clean stackable technique.** Attention Output Gate: 1,056 parameter multiplicative gate on attention output heads (12 weights × 8 heads × 11 layers), initialized to zero so scale starts at 1.0. SmearGate reintroduced (width=12, input-dependent). Legal score-first TTT (3ep, SGD, LR=0.005). Artifact 15.927 MB. No legality flags. Stack this on top of PR #1586 before next GPU run.
375
+
80.**PR #1670 (dexhunter, 1.05970) is the new best open PR — but depends on casefold ruling.** Casefold V4 + Multi-Phase Global SGD TTT achieves 1.05970 (std 0.00031, 3-seed). The Casefold legality question (Issue #1604) has no @valerio-oai ruling as of Apr 16. Do NOT implement until ruled. If casefold is approved, this becomes the primary target and resets our goal to ≤1.0499.
376
+
81.**PR #1647 (powerpratik, 1.0616) uses standard SLOT-4 — high risk.** Delta-vector logit bias optimized 4 AdamW steps per window. No organizer reviews yet. Standard SLOT (not causal SLOT-16). Risk: @valerio-oai could rule at any time. Only implement if willing to accept rejection.
377
+
82.**PR #731 (Hedge Mixer, 1.0400) is close to merge — 2 seeds pending.** Dense-count tables + Laplace smoothing + 5-expert ensemble. Reviewer confirmed score-first per chunk and said "LOOKS CLEAN." Seeds 1337 and 2024 are the only remaining gate. If both seeds confirm ~1.04, this merges and gives us a legal n-gram mixer blueprint.
378
+
83.**dexhunter now holds 3 of the top-5 open legal PRs (#1560, #1586, #1670).** Highly reliable submitter with zero legality flags across all PRs. Copy techniques from his PRs with confidence.
379
+
380
+
_Updated: 2026-04-16 (v12.5 — merged SOTA 1.0810 Day 7; PR #1667 Attention Output Gate new clean stackable tech; PR #1670 dexhunter 1.05970 best open but casefold pending; PR #1647 SLOT-4 risky; PR #731 seeds pending; 14 days remaining)_
Copy file name to clipboardExpand all lines: logs/daily_research.md
+122Lines changed: 122 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,125 @@
1
+
# Parameter Golf Daily Research - 2026-04-16
2
+
3
+
## PR #771 STATUS: CLOSED (REJECTED) — no change
4
+
5
+
@valerio-oai ruling (confirmed): "adapting model to eval tokens with TTT for multiple epochs, then reporting val numbers on those same tokens." No appeal path.
6
+
7
+
---
8
+
9
+
## N-GRAM PR STATUS
10
+
11
+
| PR | Score | Status | Notes |
12
+
|----|-------|--------|-------|
13
+
|#727| 0.9674 |**CLOSED** (illegal) | Hashed n-gram cache — ruled out Mar 27 |
14
+
|#741| 0.9850 |**CLOSED** (illegal) | Author self-closed, same illegality |
15
+
|#758| 1.0465 |**OPEN** (dead) | XOR hash key includes target token — same violation as #727. No new activity. |
**Merged SOTA: 1.0810 (bigbag, PR #1493) — DAY 7 UNCHANGED.**
23
+
24
+
Last upstream commit: `75700cb` April 9, 2026. Longest plateau since the Apr 5–9 acceleration wave. No new records in 7 days. Expect a merge wave before deadline (April 30 = 14 days).
| Watch | Self-Calibrating LMs via TTT Discriminative Distillation (SECL) | 2604.09624 | Apr 2026 | TTT pipeline that reduces ECE via discriminative distillation; score-first compatible | Targets calibration (ECE), not BPB. Low direct impact on our metric. |
90
+
| Already tracked | End-to-End TTT for Long Context | 2512.23675 | Dec 2025 | Compresses context to weights at test time via next-token prediction; scales with context length | Relevant to Doc-TTT quality; LaCT (2505.23884) is the higher-EV variant already in plan |
91
+
| Already tracked | Newton-Muon | 2604.01472 | Apr 2026 | +6% fewer steps, +4% wall-clock vs standard Muon | Verify additive with MuonEq-R before GPU spend |
92
+
| Skip | LieQ (layer-wise quant for small LMs) | 2508.03332 | Aug 2025 | Canonical division of labour across layers for PTQ; 2-bit target | Not applicable — we use int6/int7 GPTQ, not sub-4-bit regime |
93
+
94
+
No new breakthrough papers today. arXiv:2604.09624 (SECL) is the sole new find; low direct impact.
95
+
96
+
---
97
+
98
+
## HuggingFace / Community
99
+
100
+
No new relevant blog posts. dexhunter filed PR #1670 (1.05970) — their third top-10 PR (#1560, #1586, #1670). MarioPaerle is a new submitter worth watching (PR #1667 technique is clean and implementable).
101
+
102
+
---
103
+
104
+
## Recommended Action
105
+
106
+
**No change to core strategy. Two additions: PR #1667 Attention Output Gate is now a candidate to stack; casefold watch continues.**
3.**Evaluate PR #1667 Attention Output Gate + SmearGate** on same run or follow-up: 1,056 extra params, no legality concerns. If additive with #1586 + #1560, expected combined ~1.065–1.070.
112
+
4.**Watch PR #1731** — if third seed confirms 1.0400 BPB and merges, Hedge Mixer (legal n-gram interpolation) is adoptable.
113
+
5.**Watch Issue #1604** — if casefold ruled legal, PR #1670 (dexhunter, 1.05970) jumps to highest-EV action; reset target to ≤1.0499.
_Updated: 2026-04-16 (merged SOTA 1.0810 Day 7 no change; PR #1667 MarioPaerle new clean PR (1.07139, Attention Output Gate + SmearGate); PR #1670 dexhunter new best open (1.05970) but pending casefold ruling; PR #1647 SLOT-4 (1.0616) risky; casefold Issue #1604 open; 14 days remaining)_
0 commit comments