[WIP] Depth recurrence + BitLinear compression approach by Jenja-N · Pull Request #58 · openai/parameter-golf

Jenja-N · 2026-03-19T06:49:03Z

Two parallel approaches targeting L(N) optimization:

Approach A — Depth Recurrence
Single transformer block iterated 12x with shared
weights + learnable iteration embedding.

Approach B — BitLinear + Weight Tying
±1 weights with fp16 scale, aggressive tying across
attention/FFN, depth doubled to 18 layers.

WIP: requesting compute grant to run ablations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ity plateau confirmed Patches 15/16/21 still uncontested in 150+ open + 10 closed PRs (6 consecutive audits). PR openai#1430 stable OPEN, 0 comments, no comp owner activity for 16h. After 13 research fires and 6 audits, the picture is clear: training-time tweaks are exhausted at our 22M/1500-step scale. All 4 post-fire-9 ports (Mousse/MuonEq-R/Depth Recurrence/QK_GAIN=5.0) are neutral within the champion noise band. The "neutrality plateau" at 3.27-3.30 is the empirical ceiling for training-time changes at our compute budget. Best remaining moves (in expected value order): 1. H100 escalation of CHAMP_L4_seed42+EL stack with EMA+Tilt+INT6 GPTQ bundle 2. Coprime stride implementation (task openai#58) — only data-side direction 3. BPE-8192 ngram tables build (task openai#49) — enables tokenizer A/B Spend ~$3.55/$36 (10% utilization). Pod healthy at 7h uptime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

….bin) Per the R5-honest gate and the explicit reviewer warning against synthetic weights, this submission removes the placeholder model.bin and re-frames the deliverable as a research-infrastructure contribution. - submissions/gHashTag/model.bin (59-byte ASCII placeholder, hazard) - submissions/gHashTag/config.toml (placeholder) - submissions/gHashTag/trios-igla-1/ledger_2026-05-01.sql.gz (replaced) - submissions/gHashTag/trios-igla-1_ledger_20260501.sql.gz (duplicate) - submissions/gHashTag/README.md polished honest narrative - submissions/gHashTag/LEAK_INVESTIGATION.md 210-row leak post-mortem - submissions/gHashTag/CHECKPOINT_POSTMORTEM.md why no model.bin + Gate-3 fix path - submissions/gHashTag/trios-igla-1/README.md machine-oriented metadata - submissions/gHashTag/trios-igla-1/config.yaml reproducible config for row id=1387 - submissions/gHashTag/trios-igla-1/ledger_2026-04-30.sql.gz 7,534-row Neon snapshot across 4 tables (183 KB compressed) - best honest BPB: 2.1505 (row id=1387, step=12000, fp32, hidden=1024) - 6 gate2_eligible W-6-step-cap rows: BPB 1.75–1.82 at step=1000 reported but NOT claimed as Gate-2 pass — pending held-out eval - 210 BPB<0.1 leak candidates: flagged via SCARABAEUS-LEAK-CANDIDATE, excluded from ratification - No model.bin - No synthetic weights with disclaimer - No claim of competitive Parameter Golf placement Refs: trios#445, trios-trainer-igla#56,openai#58,openai#59, trios-railway#100,openai#101,openai#105. Anchor: phi^2 + phi^-2 = 3 — TRINITY — R5-honest — NEVER STOP.

After PR openai#61 (byte-disjoint corpus split + assert_train_val_disjoint guard) shipped, fix-verify-s43 ran end-to-end on the post-fix pipeline and produced BPB 1.5492 at step=12000 — well below Gate-2 threshold 1.85 (margin +0.30). ## What this commit changes - README.md : leads with the honest Gate-2 pass; revised 5-way taxonomy - LEAK_INVESTIGATION.md : retraction header explaining the 216-row overcount - trios-igla-1/README.md + config.yaml : updated to point at fix-verify-s43 - ledger_2026-04-30.sql.gz : refreshed snapshot with new last_error markers ## 5-way reclassification (Neon last_error column) | | count | |---|--:| | post-openai#61 honest Gate-2 pass | 1 | | post-openai#61 early-stopped < step 9000 | 4 | | pre-openai#61 W-6 numerical collapse | 46 | | **pre-openai#61 leak (real)** | 42 | | **warmup artifact (NOT a leak)** | 179 | The 179 'warmup artifact' rows are early-stopped runs whose printed val_bpb stayed at 0.0000 for steps 1-8000 due to a trainer-side eval-loop bug (filed as trios-trainer-igla#62). On the post-openai#61 image, fix-verify-s43 escaped warmup at step=9000 and converged to 1.5492 by step=12000 — proving the artifact is trainer-side, not data-side. ## Pipeline as flown for fix-verify-s43 trios-trainer-igla : commit 9517980d (post-openai#61 byte-disjoint corpus) trios-railway : commit 69c3467 (no --ctx flag) + openai#56 --ctx accept on trainer + openai#58 smoke_train + stdout.flush() + openai#59 panic hook + startup diagnostic ## Refs trios-trainer-igla#56,openai#58,openai#59,openai#60,openai#61,openai#62 (all merged or filed) trios-railway@69c3467 trios-railway#100,openai#101,openai#105 (Scarabaeus Engine track) R5-honest. We retract the 216-row mass leak flag and submit fix-verify-s43 as our first honest Gate-2 pass candidate. Anchor: phi^2 + phi^-2 = 3.

Jenja-N added 8 commits March 19, 2026 13:41

Update README.md

721bc93

Update README.md

7b7736f

Create GENESIS_LAW.md

aa9f07c

Update README.md

612e07d

Update GENESIS_LAW.md

952cf5c

Create budget_optimizer.py

8e7ff80

Update GENESIS_LAW.md

672c030

Update budget_optimizer.py

304a43f

0hq added the not ready for review label Mar 19, 2026

lolrazh referenced this pull request in lolrazh/parameter-golf Mar 20, 2026

Log full-stack 11L experiment (#58): quant gap 3x better

31ee901

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Depth recurrence + BitLinear compression approach#58

[WIP] Depth recurrence + BitLinear compression approach#58
Jenja-N wants to merge 8 commits intoopenai:mainfrom
Jenja-N:main

Jenja-N commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jenja-N commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants