Skip to content

Commit 12f1b67

Browse files
committed
research(2026-04-28): WD=2.0 fix for fused CE + warm-start LoRA A; PR openai#1885 PPM-D anti-hijack; Day 19 plateau; Session 24
https://claude.ai/code/session_013L9as27k9K4K8JGVwtj8Vw
1 parent 38e0a04 commit 12f1b67

1 file changed

Lines changed: 128 additions & 0 deletions

File tree

logs/daily_research.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,131 @@
1+
# Parameter Golf Daily Research - 2026-04-28
2+
3+
## PR #771 STATUS: CLOSED (ILLEGAL — no change)
4+
5+
@valerio-oai ruling (2026-03-27): train-then-score AdamW TTT 30ep = instant disqualification. Permanent. Score of 1.0705 is void.
6+
7+
---
8+
9+
## N-GRAM PR STATUS
10+
11+
- **PR #727**: CLOSED — permanent (illegal hash cache, no renormalization).
12+
- **PR #758**: OPEN but dead — XOR hash key includes target token (flagged by MatoTeziTanka Apr 12). No fix from author.
13+
- **PR #731** (Hedge Mixer — dense count tables + Laplace smoothing): OPEN — "LOOKS CLEAN" per reviewer. Seeds 1337 and 2024 still PENDING. No merge, no new activity. Unlikely to merge before Apr 30.
14+
15+
---
16+
17+
## Leaderboard
18+
19+
- **Official Merged SOTA (README)**: **1.0810** — bigbag (PR #1493, Apr 9). **Day 19 plateau** — longest in competition history (previous record was ~Day 10). Last actual model merge Apr 9. Last commit Apr 26 (PR #1806, README update only).
20+
- **Our PR #771**: CLOSED/ILLEGAL.
21+
- **Target**: ≤1.0760 bpb. **2 days to deadline (Apr 30). FINAL WINDOW.**
22+
23+
---
24+
25+
## What Changed Since Apr 27 (GitHub)
26+
27+
### New PRs Opened Today (Apr 28)
28+
29+
| PR | Author | Score | Technique | Notes |
30+
|----|--------|-------|-----------|-------|
31+
| **#1885** | leon2k2k2k | **0.99445** | PR #1850 (PPM-D) + Anti-Hijack Gate | Causality flag (gate conditioned on observed token NLL) raised by OE-GOD, **author fixed** with prefix-only NN top-probability. 15.9 MB. |
32+
| **#1886** | renqianluo | **1.06957** | Fused CE Triton kernel + WD=2.0 warm-start stability fix | ⚠️ **CRITICAL**: fused CE + warm-start LoRA A collapses at WD=1.0 on seeds 314/1337 (→ ~1.121). Must use **WD=2.0** when combining these two. |
33+
| **#1894** | ChideraIbe123 | **1.09961** | SP8192 + MuonEq-R + Loop@0.42 + RECUR_AB + QAT-lite | Below target. |
34+
| **#1893** | Hieuabssy | **1.0901** | Parallel Residuals + UNet Skips + Depth Recur + Muon | Below target. |
35+
| **#1890** | mradassaad | **1.1456** | Mamba-3 Hybrid + Multi-Epoch TTT + Dynamics-Protected Quant | Below target. |
36+
| **#1884** | someone114514 | unknown | SmearGate BOS Fix + train-only logit calibration | Score unknown. Watch — "train-only logit calibration" is a new technique to assess. |
37+
38+
### PPM-D Cluster Status (NO ORGANIZER RULING YET)
39+
40+
| PR | Author | Score | Method | Status |
41+
|----|--------|-------|--------|--------|
42+
| **#1854** | ndokutovich | **0.90236** | PPM-D order-5 on PR #1797 base | OPEN — no organizer comment |
43+
| **#1885** | leon2k2k2k | **0.99445** | PR #1850 + Anti-Hijack Gate (causality fix applied) | OPEN — legality fix applied |
44+
| **#1835** | anmarhindi | **1.00136** | PPM-D order-5 binary-λ gate | OPEN — no organizer comment |
45+
| **#1850** | someone114514 | **1.00495** | Strict Full-Val Byte PPM Mixture | OPEN — no organizer comment |
46+
47+
**Issue #1872** (PPM-D legality request) — OPEN, **no organizer response**. Core question: is the alphabet Σ in Issue #1017 C2 the token vocabulary (SP8192) or bytes (256)? No ruling before deadline is likely.
48+
49+
### Previously-Tracked PRs (Apr 28 status)
50+
51+
| PR | Score | Status | Notes |
52+
|----|-------|--------|-------|
53+
| **#1787** | 1.06335 | OPEN | Best clean base. Polar Express NS + MIN_LR=0.10. No new flags. |
54+
| **#1797** | 1.06157 | OPEN | dexhunter. SmearGate + LQER Asym on #1787. Clean. |
55+
| **#1667** | 1.07139 | OPEN | Attention Output Gate + SmearGate. Clean. Stack on #1586. |
56+
| **#1586** | 1.07493 | OPEN | Per-Layer Adaptive GPTQ (MLP=12σ/Attn=13σ/Emb int7@15σ). Clean. |
57+
| **#1727** | 1.07217 | OPEN | MP-SGD TTT 4-phase. Appears legal. |
58+
| **#1767** | 1.07209 | OPEN | LoRA-TTT warm-start A + alpha=144 + WD=1.0. ⚠️ See WD note below. |
59+
| Issue #1604 || NO RULING | Day 15+ silence. CaseOps blocked. Do not wait. |
60+
61+
---
62+
63+
## Critical Implementation Note — WD=2.0 When Using Fused CE + Warm-Start LoRA A
64+
65+
PR #1886 (renqianluo) discovers: the Triton fused cross-entropy kernel introduces fp32-accumulation differences vs standard CE. When combined with warm-start LoRA A (as in PR #1767), these numerical differences destabilize TTT at WD=1.0. Seeds 314 and 1337 collapse to ~1.121 BPB.
66+
67+
**Fix**: raise `TTT_WEIGHT_DECAY` from 1.0 → 2.0.
68+
69+
**Impact on our stack**: If we include the fused CE kernel from PR #1787 AND warm-start LoRA A from PR #1767, we MUST use WD=2.0 (not 1.0) or two of three seeds will fail. Adjust hyperparameter immediately.
70+
71+
---
72+
73+
## New Research Papers
74+
75+
### arXiv:2506.10935 — "Accelerating Newton-Schulz Iteration via Chebyshev-type Polynomials" (Jun 2025)
76+
- Derives optimal NS coefficients via Chebyshev alternance + Remez algorithm. Complementary to Polar Express (arXiv:2505.16932).
77+
- Already referenced in competition as Gram-NS — requires CUDA 12.9+ / PyTorch 2.7.1+. Verify H100 pod hardware before attempting. Polar Express NS is the safer drop-in.
78+
- **Action**: Use Polar Express first. Only try Chebyshev variant if CUDA/PyTorch requirements confirmed.
79+
80+
### "Test-Time Learning for Large Language Models" (arXiv:2505.20633)
81+
- TTL paradigm: adapts LLM to target domain via input perplexity minimization, using only unlabeled test data.
82+
- **Relevance**: Legal in spirit (test tokens only, no ground-truth labels). But for our competition, our score-first TTT (next-token prediction on already-scored tokens) is already aligned. Low actionability.
83+
84+
### "Lossless Compression via Next-Token Prediction" (arXiv:2505.06297)
85+
- LLM-generated token sequences compressed via NTP predictor. Compression = language modeling = BPB.
86+
- **Relevance**: Confirms BPB framing. No new technique for competition.
87+
88+
---
89+
90+
## HuggingFace / Community Discoveries
91+
92+
- **PR #1886 WD discovery is the most actionable new finding today.** Warm-start LoRA A instability with fused CE is a subtle fp32 accumulation issue that will silently fail on 2/3 seeds. This affects every submission combining PR #1787 (fused CE) + PR #1767 (warm-start LoRA A).
93+
- **PPM-D has now been independently confirmed by 4 separate authors** (#1795, #1835, #1850, #1854, #1857, #1885). dexhunter validation at 1.0322 + anti-hijack refinement in PR #1885. The mechanism is real. Legality is the only gate.
94+
- **Competition is effectively over for new architectures.** Final 2 days are submission/review window. File before Apr 29 noon for maximum organizer review time.
95+
- **PR #1884** (someone114514, SmearGate BOS Fix + "train-only logit calibration") — unknown score. Author filed PR #1850 (the primary PPM-D submission); this appears to be a test/alternate. "Train-only logit calibration" meaning is unclear — could be legal calibration at train-time (not eval-time), watch for score.
96+
97+
---
98+
99+
## Recommended Actions (2 days to deadline)
100+
101+
1. **TODAY IS THE LAST GPU RUN WINDOW.** Execute clean legal stack on 8xH100:
102+
- PR #1787 base: Polar Express NS + MIN_LR=0.10 + Fused CE Triton kernel
103+
- Per-Layer Adaptive GPTQ MLP=12σ/Attn=13σ + int7 Emb@15σ (PR #1586)
104+
- Attention Output Gate + SmearGate **with BOS fix** (PR #1667 + #1855/#1851 BOS fix)
105+
- LoRA-TTT warm-start A + alpha=144 + **WD=2.0** (not 1.0! — PR #1886 fix)
106+
- Target: ~1.067–1.072 bpb. File as PR by Apr 29.
107+
108+
2. **PPM-D monitoring**: Check Issue #1872 for organizer ruling tomorrow (Apr 29). If any of PR #1835/#1850/#1854 receives @valerio-oai approval before Apr 30, add as pure eval-time layer — no retraining. dexhunter's implementation runs in ~190s (OpenMP parallelized).
109+
110+
3. **DO NOT implement**:
111+
- PR #1885 (causality fix applied but overall PPM-D ruling still pending)
112+
- PR #1848 (BPB risk — sibling closed same day)
113+
- PR #1858 (partial data, 8M/40.5M tokens)
114+
- Pre-quant TTT (any form — illegal)
115+
- CaseOps (Issue #1604 — Day 15+ no ruling)
116+
- PR #1813 (Scylla — parent PR #1184 reverted by OpenAI)
117+
- Gram-NS / Chebyshev-NS without verifying CUDA 12.9+ on pod
118+
119+
4. **Low-priority watch**:
120+
- PR #731 (Hedge Mixer): seeds pending, "LOOKS CLEAN" — if merged today, n-gram mixer blueprint available
121+
- PR #1884 (train-only logit calibration): score unknown, technique unclear — check tomorrow
122+
123+
---
124+
125+
*Research session: 2026-04-28 | Days to deadline: 2 | FINAL GPU WINDOW*
126+
127+
---
128+
1129
# Parameter Golf Daily Research - 2026-04-27
2130

3131
## PR #771 STATUS: CLOSED (ILLEGAL — no change)

0 commit comments

Comments
 (0)