Skip to content

Submission: SP8192 + Depth Recurrence + Muon 0.99 (1.1497 pre-quant BPB)#1739

Open
DevelopedByAnurag wants to merge 1 commit intoopenai:mainfrom
DevelopedByAnurag:submission-sp8192-depth-recur
Open

Submission: SP8192 + Depth Recurrence + Muon 0.99 (1.1497 pre-quant BPB)#1739
DevelopedByAnurag wants to merge 1 commit intoopenai:mainfrom
DevelopedByAnurag:submission-sp8192-depth-recur

Conversation

@DevelopedByAnurag
Copy link
Copy Markdown

@DevelopedByAnurag DevelopedByAnurag commented Apr 19, 2026

SP8192 + Depth Recurrence + Muon 0.99 + SmearGate + EMA

Pre-quantization BPB: 1.1497 Post-quantization BPB: 1.3936

Combined SP8192 vocabulary, depth recurrence (layers 4,5 looped from step 0), Muon momentum 0.99 with 1500-step warmup, 3000-step warmdown, SmearGate, EMA (0.996), and sliding window eval (stride 64).

The large quantization penalty is caused by recurrence-amplified weight outliers in INT8 clipping. Planning to address this with MuonEq-R and GPTQ in a follow-up submission.

leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 20, 2026
… candidates

User shared a deep timeline of all recurrence experiments in the
PG competition (openai#8 through openai#1739). Several of my previously-proposed
experiments have ALREADY BEEN TESTED ON THIS STACK and shown to fail:

KILLED:
- Timing sweep earlier: openai#1726 showed 0.15 is +0.050 worse; openai#1739
  showed step-0 catastrophic (1.3936 bpb)
- Progressive ramp: openai#1663 showed hard-onset = smooth, no difference
- Position shift: openai#1726 showed layer 2-7 +0.163 worse, layer 5-6 shift
  +0.006 worse — layer 3-5 IS the empirical sweet spot

Also corrected the baseline config: openai#1736 uses LOOP_START=3 LOOP_END=5
(three layers: 3, 4, 5 — "Loop345"), not Loop45 as directory name
suggests. 3 layers × 3 passes = 17 virtual layers.

VIABLE candidates:
- Recur-Alpha (openai#1714, Anakintano): learnable scalar per looped block,
  init 0 → identity. 6 params. Author's grant ran out before TTT eval
  so composition with openai#1736's phased TTT is genuinely open. NEW TOP PICK.
- Cross-pass XSA: still novel, untested in any PR
- Loop3-6 variant (openai#1678): tashapais running it; might wait for result

Recommendation updated: port Recur-Alpha onto openai#1736 as spec 015.
~$25, identity-at-init (safe), 30 LOC, direct recurrence question.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 20, 2026
First smoke workflow (2026-04-21) was halted by execution because they
saw 15 consecutive α grad_norm=0.0 log entries and matched this to the
spec's stop-early criterion. BUT α is architecturally out-of-circuit
until looping_active=True (at training_frac ≥ 0.35), so grad_norm=0
during the pre-looping phase is EXPECTED, not a bug.

The smoke was actually clean — 500 iters no NaN, identity-at-init
preserved, compile OK. Spec's wording was the problem.

Fixes:
1. Add ⚠️ CRITICAL banner at top of spec, explicitly calling out the
   pre-looping-activation expectation. Includes a table mapping
   smoke/screen phase to correct grad_norm interpretation.
2. Rewrite stop-early criteria to explicitly condition on
   looping_active=True. Zero-grad pre-activation is expected.
3. Add smoke protocol requiring ENABLE_LOOPING_AT=0 OVERRIDE for the
   smoke (forces looping active, enables α plumbing check in 500 iters).
4. Explicit note: do NOT propagate smoke override to real screen.
   openai#1739 / openai#1726 evidence: step-0 activation is catastrophic.
5. Document the prior-incident failure mode so execution doesn't
   repeat the same false-positive halt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant