Record: SP8192 + 3-Layer Recurrence + Hard Onset — val_bpb 1.08625 (3-seed mean) by pablinga19 · Pull Request #1663 · openai/parameter-golf

pablinga19 · 2026-04-16T07:44:06Z

Record: SP8192 + 3-Layer Recurrence + Hard Onset — val_bpb 1.08625

val bpb: 1.08625 (3-seed mean, std=0.0023)

This submission is a direct derivative of #1394. I keep that stack largely fixed and test one isolated change: replacing smooth recurrence onset with a hard activation at step 3000. The goal is to preserve more non-recurrent training within the fixed 600-second budget before enabling the recurrent virtual-layer sequence later in the run.

Results

Seed	Steps	Post-EMA fp32 val_bpb	Sliding val_bpb	Artifact bytes
314	5258	1.08438	1.08424	15,998,501
1337	5247	1.08536	1.08582	15,857,563
42	5033	1.08683	1.08868	15,974,578
Mean		1.08552	1.08625

All three runs use the same script, changing only the SEED environment variable.

Why hard onset can help

Let $r_0$ denote training throughput before recurrence is enabled, and let $r_1 < r_0$ denote throughput after recurrence is active. Under a fixed wall-clock budget $T$, a hard-onset schedule at step $s_0$ yields approximately

$$N_{\text{hard}} \approx s_0 + r_1\left(T - \frac{s_0}{r_0}\right),$$

because the run first spends $s_0/r_0$ seconds in the non-recurrent regime, then uses the remaining time in the recurrent regime.

For a smooth-onset schedule, throughput begins to decline earlier. If $r(t)$ denotes the time-varying throughput during the ramp, then total realized steps are

$$N_{\text{smooth}} = \int_0^T r(t),dt,$$

with $r(t) < r_0$ over part of the interval before the hard switch point. In that setting, delaying recurrence can increase the number of realized optimization steps by allocating a larger fraction of the fixed budget to the higher-throughput non-recurrent regime, while still enabling recurrence later in training.

In this submission, enabling the 3-layer recurrence stack at step 3000 produced the reported 3-seed mean sliding val_bpb of 1.08625.

Where the change lives in code

This submission is a direct derivative of #1394 and isolates one scheduling change. The relevant code is in train_gpt.py:

What	Location
Env-var knobs	`Hyperparameters` class: `RECUR_HOMOTOPY`, `RECUR_START_STEP`, `RECUR_HOMOTOPY_TMID`, `RECUR_HOMOTOPY_TAU`
Onset logic	`update_recurrence_onset()` — a single function containing both the hard and smooth paths
Call site	Training loop, right after `last_step` check

With RECUR_HOMOTOPY=0 (default), the function reduces to a one-line step threshold at RECUR_START_STEP=3000. Everything else in the script is unchanged from #1394.

Notes

VAL_LOSS_EVERY=99999 removes mid-training validation passes, increasing realized train steps within the same 600s budget.
Quantization overhead from post-EMA fp32 to int6 sliding is small across all three seeds.
Training logs, eval logs, and the script are included.

Note_V2: This submission is part of my application for additional compute. My independent resources are limited, so further progress depends on access to more credits. With that support, I’d like not only to continue refining this line, but also to investigate broader, more ambitious ideas around the same core problem

…seed mean)

- Match heading, table, and section format from openai#1218/openai#1394 - Add Post-quant BPB column, bold Sliding BPB values - Add missing submission.json fields (hardware, bytes_total, bytes_code) - Remove Deltas and Reproducibility sections - Round val_bpb to 5 decimal places consistently Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Move onset scheduling logic from inline training loop into update_recurrence_onset() with clear docstring documenting both modes - Add structured comments on RECUR_HOMOTOPY / RECUR_START_STEP env vars - Add "Where the change lives in code" section to README - Update bytes_code in submission.json Behavior is preserved: the function contains the exact same branching logic that was previously inline at the training-loop call site. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… candidates User shared a deep timeline of all recurrence experiments in the PG competition (openai#8 through openai#1739). Several of my previously-proposed experiments have ALREADY BEEN TESTED ON THIS STACK and shown to fail: KILLED: - Timing sweep earlier: openai#1726 showed 0.15 is +0.050 worse; openai#1739 showed step-0 catastrophic (1.3936 bpb) - Progressive ramp: openai#1663 showed hard-onset = smooth, no difference - Position shift: openai#1726 showed layer 2-7 +0.163 worse, layer 5-6 shift +0.006 worse — layer 3-5 IS the empirical sweet spot Also corrected the baseline config: openai#1736 uses LOOP_START=3 LOOP_END=5 (three layers: 3, 4, 5 — "Loop345"), not Loop45 as directory name suggests. 3 layers × 3 passes = 17 virtual layers. VIABLE candidates: - Recur-Alpha (openai#1714, Anakintano): learnable scalar per looped block, init 0 → identity. 6 params. Author's grant ran out before TTT eval so composition with openai#1736's phased TTT is genuinely open. NEW TOP PICK. - Cross-pass XSA: still novel, untested in any PR - Loop3-6 variant (openai#1678): tashapais running it; might wait for result Recommendation updated: port Recur-Alpha onto openai#1736 as spec 015. ~$25, identity-at-init (safe), 30 LOC, direct recurrence question. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Active research thread's first experiment. Pinned to commit a9aa141 on exp/recur-alpha. Key decisions baked in: - Screening mode first (~$6 total, skip TTT/GPTQ/EMA) - TRAIN_LOG_EVERY=100 for diagnostic resolution - p2p cosine diagnostic off by default (torch.compile concerns) - Single seed 42; conditional 3-seed + full TTT only if Δ ≤ -0.001 - Identity-at-init safety: α=0 = passthrough, worst case no change Three disproven recurrence-class experiments explicitly NOT in this spec (earlier activation openai#1726, schedule smoothing openai#1663, position shift openai#1726). Those would be wasted spend per existing PG evidence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pablinga19 and others added 2 commits April 16, 2026 09:43

Record: SP8192 + 3-layer recurrence + hard onset — val_bpb 1.0862 (3-…

719cde1

…seed mean)

pablinga19 changed the title ~~Record: SP8192 + 3-layer recurrence + hard onset — val_bpb 1.0862 (3-seed mean)~~ Record: SP8192 + 3-Layer Recurrence + Hard Onset — val_bpb 1.08625 (3-seed mean) Apr 16, 2026

pablinga19 and others added 3 commits April 16, 2026 10:21

Update README with final writeup text

322c10d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add compute grant note to README

44b1823

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: SP8192 + 3-Layer Recurrence + Hard Onset — val_bpb 1.08625 (3-seed mean)#1663

Record: SP8192 + 3-Layer Recurrence + Hard Onset — val_bpb 1.08625 (3-seed mean)#1663
pablinga19 wants to merge 5 commits intoopenai:mainfrom
pablinga19:record/SP8192-3LayerRecur-HardOnset-NoTTT

pablinga19 commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pablinga19 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Record: SP8192 + 3-Layer Recurrence + Hard Onset — val_bpb 1.08625

Results

Why hard onset can help

Where the change lives in code

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pablinga19 commented Apr 16, 2026 •

edited

Loading