feat: recursive weight sharing for 16MB limit by ArthurKaroyan · Pull Request #15 · openai/parameter-golf

ArthurKaroyan · 2026-03-18T20:18:06Z

No description provided.

Add entries openai#15-18 to experiment log covering three worktree experiments: - GatedCausalConv (ssl): conv replacing first transformer block, best 1.2247 bpb - NorMuon (normuon): per-row second moment normalization in Muon (code-only) - SPlus (svdopt): SVD eigenbasis optimizer replacing Muon (code-only) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

0hq · 2026-03-19T16:58:36Z

Not a valid submission, resubmit with training log to prove efficacy.

…prime stride sampling Inspired by PR openai#1099/openai#1060/openai#1135 which use TOKEN-level coprime stride. Token-level needs 60+ LOC rewrite of TokenStream (no random access). Shipping the SHARD-LEVEL variant: modify _advance_file() to use a coprime stride instead of +1, so nearby training steps see topically-different shards rather than adjacent similar ones. Implementation: 13 LOC, two anchors in TokenStream class (none of the existing 24 patches touch TokenStream — verified via grep). Gated by USE_COPRIME_STRIDE=1, falls back to stride=1 default. Idempotent via COPRIME_STRIDE_MARKER. Effect: with N shards and gcd(s,N)=1, iterates 0->s->2s->... covering all shards before repeating. Max spacing diversity = better gradient noise reduction. Smaller benefit than full token-level (~25% per PR openai#1099 logic), but ships TODAY at near-zero risk vs. 60+ LOC structural rewrite. 4 CS experiments queued: CS0_alone, CS1_seed42, CS2_L4weights, CS3_with_engram. This is the FIRST data-side patch in our 24-patch stack. Tests a completely new vector after the "neutrality plateau" of architectural/optimizer/training-time patches. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: recursive weight sharing for 16MB limit

28d41a4

0hq added the not ready for review label Mar 19, 2026

0hq closed this Mar 19, 2026

gb250e referenced this pull request in gb250e/parameter-golf Mar 21, 2026

docs: add PR #15 update summary

559e576

mrdavtan mentioned this pull request Mar 22, 2026

Non-record: Negative findings on codebook quantization, magnitude pruning, multi-token prediction, embedding factorization #212

Closed

MarioPaerle mentioned this pull request Apr 16, 2026

RECORD: SmearGate + Attention Output Gate + Legal TTT | val_bpb=1.07139 #1667

Merged

jamesEmerson112 mentioned this pull request Apr 30, 2026

Record: SP8192 Full Stack + Headwise Gated Attention + PreQuantTTT (1.0511 BPB, 3-seed) #1992

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: recursive weight sharing for 16MB limit#15

feat: recursive weight sharing for 16MB limit#15
ArthurKaroyan wants to merge 1 commit intoopenai:mainfrom
ArthurKaroyan:feat/recursive-transformer

ArthurKaroyan commented Mar 18, 2026

Uh oh!

0hq commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ArthurKaroyan commented Mar 18, 2026

Uh oh!

0hq commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants