Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
306 commits
Select commit Hold shift + click to select a range
f8caa0c
Rat Rod: add zero-cost H100 sweeps and robust trainer toggles
Mar 27, 2026
9e826d9
Purple_1: phrase cache + regime tracker + warmdown=2000 + chunk=65K
Mar 27, 2026
c185a8d
Add Siphon: ensemble-objective training + WARMDOWN2000 SOTA entry
Mar 27, 2026
63c27e1
FX-Wing: Instructed Recurrence — content-derived loop instructions fo…
Mar 27, 2026
4ab4ced
FX-Wing: add hypothesis and ablation plan
Mar 27, 2026
7a81eec
Reorganize: move master runner to experiments/Biology_concepts/run_al…
Mar 27, 2026
95e9333
Add green v6 (optimized SOTA): v1 + WARMDOWN_ITERS=2000
Mar 27, 2026
5268082
Add Biology Concepts sweep findings — tornado vs baseline analysis
Mar 27, 2026
516e2c8
Add green v7: v6 + COMPLEMENT_ALPHA=0.5
Mar 27, 2026
9a58d14
FX-Wing: fix compile — COMPILE_FULLGRAPH=0 for crawler loop
Mar 27, 2026
15c66fc
FX-Wing: CRAWLER_LOOPS=4 — exploit weight-sharing compression
Mar 27, 2026
909901e
Log v7 results: COMPLEMENT_ALPHA=0.5 worse than v1
Mar 27, 2026
812599d
FX-Wing: CRAWLER_QUANT_INT8 — int8 precision for shared crawler block
Mar 27, 2026
c641e5e
Add vast_fxwing_single.sh — single GPU FX-Wing launcher for Vast.ai
Mar 27, 2026
ce5e317
Add Cambrian: DeltaNet × Biology Concepts architecture
Mar 27, 2026
38479b9
Cambrian-0: GatedDeltaNet × Bio Seam architecture skeleton
Mar 27, 2026
2df9c72
Fix bio concept scripts: make MAX_WALLCLOCK_SECONDS env-overridable
Mar 27, 2026
8b93705
Cambrian-1: Add four bio seam controllers (Myelin, Circadian, Clonal,…
Mar 27, 2026
b0776f1
FX-Wing micro: device-flexible concept test for GB10 Blackwell DGX Spark
Mar 27, 2026
da80af1
FX-Wing: add DeltaNet associative memory to crawler reservoir
Mar 27, 2026
fa21139
FX-Wing micro: add -u flag for unbuffered stdout through tee pipe
Mar 27, 2026
531f98f
vast: blacklist offer 33510639 (103.42.50.244 — SSH never connects)
Mar 27, 2026
ff7069b
FX-Wing DeltaNet: disable compile on forward to prevent T-loop OOM
Mar 27, 2026
36845e3
FX-Wing run.sh: DELTA_NET_HEADS=0 for core concept test
Mar 27, 2026
fa4c218
FX-Wing: suppress inductor NaN in RoPE bounds analysis (PyTorch 2.4 bug)
Mar 27, 2026
b4968be
Cambrian: disable torch.compile on GatedDeltaNet.forward
Mar 27, 2026
f74175d
Cambrian run.sh: set COMPILE_FULLGRAPH=0
Mar 27, 2026
cea3b8b
Fix astrocyte gate shape bug: view(B,1,1) not unsqueeze(1).unsqueeze(2)
Mar 27, 2026
5fac1c8
GreenRod X_1: Hybrid DeltaNet + Attention engine
Mar 27, 2026
b55a421
Cambrian: forward PYTORCH_CUDA_ALLOC_CONF to torchrun (expandable_seg…
Mar 27, 2026
15714f9
Cambrian: remove @torch.compiler.disable from GDN.forward
Mar 27, 2026
24dd550
FX_Wing_Delta: flow instructions + DeltaNet + hypothesis
Mar 27, 2026
0b2164d
Cambrian: restore @torch.compiler.disable, default wallclock 600s
Mar 27, 2026
9c34b42
FX_Wing_Sigma: n-gram entropy as smoothing reference hypothesis
Mar 27, 2026
0c623c7
Add Cambrian bio seam sweep script
Mar 27, 2026
96bc2b4
FX_Wing_Delta: disable DeltaNet for flow-only test, add inductor patch
Mar 27, 2026
3adddb0
FX_Wing_Delta_DN: DeltaNet with gradient checkpointing + truncated BPTT
Mar 27, 2026
7b5e09c
Fix Cambrian bio sweep hang: SKIP_FINAL_EVAL=1 + process cleanup
Mar 27, 2026
c7ffeec
Deprecate FX_Wing* experiments; add FA_Wing_Green_1 gitignore
Mar 27, 2026
c9600c7
Add Cambrian agent instructions for Vast.ai sweep
Mar 27, 2026
03f9838
Add FA_Wing_GreenDN_1 (flow instructions + DeltaNet); gitignore both …
Mar 27, 2026
7c197c7
Add FA_Wing_Green_1 and FA_Wing_GreenDN_1 experiment code
Mar 27, 2026
0a89f4a
Fix REPO_ROOT depth in FA_Wing run.sh files (../.. not ../../..)
Mar 27, 2026
8037fce
Fix DDP unused-params crash: disable VE in FA_Wing crawler runs
Mar 27, 2026
3651d35
Add ClownCar experiment; restore FX_Wing_Delta from deprecated
Mar 27, 2026
f2a4f5f
ClownCar: disable ngram eval — sliding window baseline only
Mar 27, 2026
5ae2be5
Add ClownCar_II: canonical FLA DeltaNet + Crawler symbiotic pairing
Mar 27, 2026
ba4a2a7
Fix ClownCar/II run.sh: add missing crawler flags (USE_CRAWLER=1 etc.)
Mar 27, 2026
87ad173
ClownCar_II: add FLA ops preflight check to confirm canonical kernel …
Mar 28, 2026
e3ba281
Fix ClownCar_II: cast q/k/v/beta to x.dtype before chunk_delta_rule
Mar 28, 2026
c0cf2ac
Add ClownCar_IV: GPTQ bypass + state dtype fix
Mar 28, 2026
5d9e0b2
Fix ClownCar_IV: revert state dtype cast — only change is SKIP_GPTQ=1
Mar 28, 2026
a7d53c8
ClownCar_IV: SKIP_GPTQ only — restored from known-good e3ba281
Mar 28, 2026
baceb10
ClownCar_IV: reset to ClownCar_II base + EMA_DECAY=0.99
Mar 28, 2026
e587c91
ClownCar_IV: remove GPTQ, use naive int6
Mar 28, 2026
c262086
Add ClownCar_VI and Medusa: skip EMA + naive int6
Mar 28, 2026
07a57bf
pod_setup: add fla + attr install for DeltaNet
Mar 28, 2026
d9db34d
Add ClownCar_VII: loop-aware 2-phase GPTQ + no EMA
Mar 28, 2026
cc06d3b
Medusa: sync to ClownCar_VII (loop-aware GPTQ + no EMA)
Mar 28, 2026
ebc4b84
Add Medusa_II: late-start EMA (step 4400) + loop-aware GPTQ
Mar 28, 2026
4aa704b
Medusa_II: add short exit-only unravel A/B harness
Mar 28, 2026
9d1be62
Add Medusa_IV: copy of Medusa_III (winning 1.0366 config)
Mar 28, 2026
4b1c51c
Medusa_II: force finish-only A/B and add one-command launcher
Mar 28, 2026
d2f47e2
Add Medusa_V: fix state dtype cast (new_state.to(dtype))
Mar 28, 2026
0c38323
Medusa_II: add additional-only unravel check runner
Mar 28, 2026
d74538f
Add Medusa_V_SOTAMAXX: frozen SOTA config snapshot
Mar 28, 2026
9fa4fec
Add Medusa_VI: DeltaNet projections → CastedLinear for QAT coverage
Mar 28, 2026
0ce12a6
Records: fill Medusa_IV known results (seeds 300, 1337)
Mar 28, 2026
a4a5447
Records: Medusa Unstable README with known results
Mar 28, 2026
5f731b3
Records: Medusa_IV 3-seed complete — seed 42=0.8104 BPB (best), mean=…
Mar 28, 2026
79f45ae
Add Medusa_Legal_unstable: fix GPTQ training-data access after wallcl…
Mar 28, 2026
556b2fc
Medusa_VII: causality fix + shard header fix + DeltaNet ablation
Mar 29, 2026
3e09695
Medusa_VII: add ablation results
Mar 29, 2026
f74b9c9
Bandit: ClownCar crawler + X-WING ngram oracle
Mar 29, 2026
3a75282
Bandit: fix GPTQ wallclock violation (GPTQ_RESERVE_MS=30s)
Mar 29, 2026
4efa746
Bandit: ClownCar Crawler x Cubric Ngram9 — 0.4961 BPB (3-seed mean)
Mar 29, 2026
e6d11d8
Log JR-03 fused MLP result as loser (with Triton-node caveat)
Mar 30, 2026
1a8501a
Crawler_Leg_1: add run_all.sh sequencer for all 11 ablation arms
Mar 30, 2026
946f0a7
Rascal II: skip GPTQ + embed int6 — full 600s, target <16MB
Mar 30, 2026
f1ce7c9
SOTA: Rascal II — new best legal submission 1.10986874 BPB, 15.44MB
Mar 30, 2026
39ed402
Record: Rascal — val_bpb 1.1099 (3-seed mean)
Mar 30, 2026
1d48f9c
Add FX_Wing_Delta_safe: byte-identical backup of FX_Wing_Delta
Mar 30, 2026
9a15ace
Add ChopShop (cleaned Rascal base) and Rascal_Stripper smoke test
Mar 30, 2026
964fd8b
Add all research data: experiments, records, scripts, Nitrust, octavian
Mar 30, 2026
7de8402
Crawler_Leg_2: 5-arm sweep combining loops=3 + mlp=5.0 wins
Mar 30, 2026
da7a6b0
Bandit_Wagon: remove NGRAM code, apply optimal CL1 config
Mar 30, 2026
d2d1ecd
Rascal_Stripper: implement TurboMuon + EngramLite + TTT + CROWN-Q
Mar 30, 2026
2d7022b
Crawler_Leg_2: set wallclock to 350s (~4k steps on 8xH100)
Mar 30, 2026
b39f23c
Bandit_Wagon: rewrite HYPOTHESIS.md for pure neural crawler campaign
Mar 30, 2026
4f37849
Rascal_Stripper: fix CROWN-Q variable name collision (scale → q_scale)
Mar 30, 2026
4603c48
Rascal_Stripper: bump smoke test to 3200 steps (warmdown 800)
Mar 30, 2026
206434c
CL2 results: 1.19593 BPB — loops=3+mlp=5.0+LOOP_AWARE_GPTQ+COMPILE wi…
Mar 30, 2026
dd9f4fd
Crawler_Leg_3: full 600s run, loops=3 mlp=6.0, Rascal warmdown style
Mar 30, 2026
0e2286f
Rascal_Stripper: add ttt_calibrate.py — standalone TTT hyperparameter…
Mar 30, 2026
9de1f3b
Crawler_Leg_3: multi-seed script + submission skeleton
Mar 30, 2026
411970f
Rascal_Stripper: add ttt_sweep.sh — 3-config TTT calibration runner
Mar 30, 2026
8b17867
Rascal III: TurboMuon + EngramLite combo runner (600s production)
Mar 30, 2026
1194948
Crawler submission: 3-seed complete, 1.1874 BPB mean
Mar 30, 2026
3d6dc3b
Bandit_Wagon: update to CL3 proven config (mlp=6.0, SKIP_GPTQ=1)
Mar 30, 2026
d31cd54
Bandit_Wagon: clean HYPOTHESIS.md — remove stale oracle/proxy refs, f…
Mar 30, 2026
c8a2468
Bandit_Wagon: strip dead code from train_gpt.py (2378 → 1860 lines)
Mar 30, 2026
78a4e47
Bandit_Wagon: fix banner title
Mar 30, 2026
2e3d5bf
pod_setup.sh: switch branch to TEST_LAB, remove dead FLA/DeltaNet ins…
Mar 30, 2026
5417530
Bandit_Wagon: add ad-hoc winddown A/B suite
Mar 30, 2026
4ce945f
Add clean Rascal A/B lab for baseline, turbomuon, engramlite, combo
Mar 30, 2026
4401ff8
BW-00 anchor: 1.18616 int6 SW BPB (seed 444, 8×H100, 600s)
Mar 30, 2026
d02bb2c
Bandit_Wagon: add run_ablations.sh — BW-01..04 back-to-back at 350s, …
Mar 30, 2026
e135eb9
Bandit_Wagon: run_ablations.sh default NPROC=1 (single GPU signal)
Mar 30, 2026
9e8b69f
Bandit_Wagon: fix run_ablations.sh env var passing (use env)
Mar 30, 2026
56e3ff3
Bandit_Wagon: add 1-GPU winddown wrapper
Mar 30, 2026
656622a
pod_setup.sh: download tokenizer (fineweb_1024_bpe.model) in step 6
Mar 30, 2026
b64efeb
Bandit_Wagon: run_ablations.sh step-based stopping (ABLATION_STEPS=500)
Mar 30, 2026
33742df
Add fresh pod bootstrap and single-H100 signal runners
Mar 30, 2026
b04652b
Make Rascal runners use portable torchrun default
Mar 30, 2026
e2b3ec0
Add Bandit_wagon_5f_ablations: 4F vs 5F direct proxy comparison
Mar 30, 2026
7a36bf8
Add Rascal_Turbo race-ready TurboMuon-only variant
Mar 30, 2026
3d675e0
Bandit_wagon_5f_ablations: 4F+1C confirmed optimal, 5F hypothesis denied
Mar 30, 2026
f3ecde9
Add bandit_wagon_XSA: XSA coverage sweep on confirmed 4F+1C config
Mar 30, 2026
550edbf
Bandit_Wagon: add 8xH100 launcher and checkpoint arch autodetect
Mar 30, 2026
6a46f87
Add bandit_wagon_crawler_mlp: crawler MLP leaky slope sweep
Mar 30, 2026
be8459a
bandit_wagon_XSA: XSA=15 (full coverage) wins on BPB AND speed
Mar 30, 2026
07eb836
Rascal_Turbo: single run.py launcher
Mar 30, 2026
7f773de
bandit_wagon_choke: per-loop bottleneck choke sweep for crawler MLP
Mar 30, 2026
aeab681
bandit_wagon_smear: LoopSmearGate — depth error damping between crawl…
Mar 30, 2026
0df2921
Add single-H100 Rascal ablation matrix runner
Mar 30, 2026
2958dca
bandit_wagon_tap: per-loop gated encoder tap sweep (7 arms)
Mar 30, 2026
37f1dcf
Add sparse skip-gram ngram ablation for single-H100 Rascal
Mar 30, 2026
1b674cf
Revert unvalidated sparse skip-gram integration from Rascal runner path
Mar 30, 2026
fe3b7d7
bandit_wagon_battery: per-loop RoPE scale sweep + mega ablation runner
Mar 30, 2026
38c8826
Add isolated sparse skip-gram ablation (2200-step single-GPU)
Mar 30, 2026
c3d3b8f
bandit_wagon_crawler_mlp: log BW3 results — slope insensitive, stay a…
Mar 31, 2026
bac88a6
bandit_wagon_battery: fix MLP.forward to accept optional loop_idx
Mar 31, 2026
6b7f205
Remove ngram sparse ablation files; keep Rascal path ngram-free
Mar 31, 2026
f7f301a
Add stripped Rascal skip-gram 2200-step calibration runner
Mar 31, 2026
5ccd09c
Add bandit_wagon_choke_shaped experiment (BWCS series)
Mar 31, 2026
34ce3e4
Log 2026-03-31 single-H100 RASCAL ablation matrix results
Mar 31, 2026
cb25a92
Add next calibrated single-GPU RASCAL ablation pack
Mar 31, 2026
362b220
Crawler Leg3 README: add architecture philosophy
Mar 31, 2026
6b81bb0
Crawler Leg3 README: add active ablation work section
Mar 31, 2026
bb5b3d4
Log skip-gram calibration seed444 results
Mar 31, 2026
f7edb50
Add BWCB ablation: battery scales on pyramid-512 choke
Mar 31, 2026
66b94eb
Add loader-refine single-GPU ablation pack and notes
Mar 31, 2026
ed2ec71
Add BWCD ablation: descending battery on pyramid-512
Mar 31, 2026
dda1f5b
BWCD: add BWCD-03 wide-medium-wide bracket (9,3,9)
Mar 31, 2026
1108f46
Record mega ablation results in BWB HYPOTHESIS.md
Mar 31, 2026
84f8fbe
Add race-ready Rascal final submission package with loader_cache4 lau…
Mar 31, 2026
d4f8e74
BWCB results: ascending battery hurts pyramid, all configs worse
Mar 31, 2026
3c0ca4c
BWCB Run B (4 shards): 1,2,4+pyramid beats pyramid alone by -0.00210
Mar 31, 2026
ffa8b17
Enforce FA3 preflight and CUDA runtime path in final Rascal launcher
Mar 31, 2026
361114a
BWCD results: 9,1,1+pyramid wins at -0.01193 vs pyramid alone
Mar 31, 2026
3d229bc
BWCD complete: BWCD-03 (9,3,9) final — quant_gap +0.0062, worst of group
Mar 31, 2026
b8f371b
Add Bandit_Wagon_III: pyramid-512 + 9,1,1 battery production runner
Mar 31, 2026
46fb4bd
Rascal: record 2026-03-31 TTT sweep regression (seed 444)
Mar 31, 2026
f962265
Add bandit_wagon_cannon (BWE): per-loop output calibration ablation
Mar 31, 2026
f3cacec
Log Run C results: 1-shard pod with different val data — not directly…
Mar 31, 2026
249f3ba
Rascal: add rascal_master config copies
Mar 31, 2026
352d774
BW3 run.sh: clean competition runner with preflight guards
Mar 31, 2026
aeee4b4
Rascal_Master: SOTA-exact race script — fix COPRIME_MAX_LOADED_SHARDS…
Mar 31, 2026
b9fa53b
BW3 seed=444: 1.20684 int6_sw_bpb, 10.07MB — pyramid-512 + 9,1,1 results
Mar 31, 2026
fa04306
BW3 run.sh: auto-save checkpoint after run
Mar 31, 2026
17be781
Add Bandit_Wagon_IV: 9,1,1 battery without pyramid choke
Mar 31, 2026
8208f50
BW4 seed=444: 1.18731 int6_sw_bpb — beats Leg 3 SOTA with battery only
Mar 31, 2026
a80c8cc
Bandit cannon: log seed444 proxy results (TTT bust)
Mar 31, 2026
972dcf3
BW4: add gate_fullgraph.sh — Tier 1 COMPILE_FULLGRAPH=1 test
Mar 31, 2026
872a159
BW4 gate_fullgraph: fix broken ablation — revert TORCHDYNAMO_OPTIMIZE…
Mar 31, 2026
d4fc252
Add Bandit_Wagon_V: BW4 + COMPILE_FULLGRAPH=1 (Tier 1 speed win)
Mar 31, 2026
1bcab0b
Add master progress checklist
Mar 31, 2026
7f55f3d
BW5 seed=444 results: 1.18672 int6_sw_bpb
Mar 31, 2026
c18ad76
Bandit_Wagon_V_Cannon: single GPU cannon gate on BW5 base
Mar 31, 2026
86af1f3
Add QK_SLOT_Ablation: single-GPU cross-correlation harness
Mar 31, 2026
0b511a3
Lab cleanup: archive old BW experiments, add LAB_PROTOCOL.md
Mar 31, 2026
25b18fa
BW5 seed=300: 1.18758 — does not individually confirm
Mar 31, 2026
6001a70
Add one-shot 8x quick AB runner for Rascal GPTQ stream vs insta
Mar 31, 2026
ee004c9
Make 8x quick AB runner self-contained with FA3 preflight and no rg d…
Mar 31, 2026
7b2e280
Lock Rascal baseline launcher to record trainer and add one-shot base…
Mar 31, 2026
2f151c1
Bandit_Wagon_V_Cannon: cannon gate results — does not promote
Mar 31, 2026
c6caaad
Add one-shot cu124 baseline runner that reuses custom FA3 module
Mar 31, 2026
9f982e0
Make cu124+custom-FA3 runner auto-detect non-venv base python
Mar 31, 2026
e132ccd
Harden cu124 custom-FA3 runner with python-path and filesystem auto-d…
Mar 31, 2026
d4f108b
Extend custom FA3 detection to conda env pythons and .so module files
Mar 31, 2026
1b16d62
Add --no-deps FA3 wheel fallback to cu124 baseline runner
Mar 31, 2026
3f3fa94
Add pyramid/cannon gate scripts — 1gpu + 8gpu for each hypothesis
Mar 31, 2026
f396059
Add QK_SLOT_Ablation STATUS.md — current position log
Mar 31, 2026
c61a16a
Fix gate scripts: stop swallowing output with command substitution
Mar 31, 2026
73446af
Add sota_now.sh — clean single-file cu124 baseline runner
Mar 31, 2026
3a1bdc1
Fix sota_now.sh to use real submission file; vault the correct source
Mar 31, 2026
0e7c317
Relax CUDA check to 12.x — cu128 pod is valid
Mar 31, 2026
bbd4d8a
Fix pod_setup.sh: auto-detect WORKSPACE from script location
Mar 31, 2026
47f450f
Fix inverted awk stack parity check in sota_now.sh
Mar 31, 2026
82c3d26
Quarantine racecar lab confusion; fix records/ with vault file
Mar 31, 2026
fa91fda
Fix pod_setup.sh: dataset shard count crash under set -euo pipefail
Mar 31, 2026
1d9edb8
Add two-track lab structure: neural/ and crawler/
Mar 31, 2026
525cd34
BWVC 8GPU gate results: scalar cannon passes speed gate
Mar 31, 2026
90475fc
Add RESULTS.md stub for Bandit_Wagon_V_Pyramid
Mar 31, 2026
9e5fef2
Junkyard: move all legacy/inactive dirs off root surface
Mar 31, 2026
c688b8b
Add folder-based CLAUDE.md agent protocols
Mar 31, 2026
645857b
BWVP 1GPU gate results: pyramid STRONG PASS on quality
Mar 31, 2026
2f4a7ed
Move active BW5 gates into crawler/ track; fix train_gpt.py symlink
Mar 31, 2026
5756670
Fix PyramidCannon gate_1gpu.sh usage comment path
Mar 31, 2026
29caede
Add H→A→R cycle to protocol; scaffold all 3 files in new_leg.sh
Mar 31, 2026
c48e759
Add QK_GAIN_SLOT_Gate ablation experiment
Mar 31, 2026
35f212e
Add per-track science boards + auto-stub in new_leg.sh
Mar 31, 2026
006d1ee
Add submissions/ PR zone — validate script, protocol, templates
Mar 31, 2026
1ea0901
Move pod_setup.sh to scripts/ — accessible from repo root
Mar 31, 2026
85d11b1
BWVPC 1GPU gate: pyramid+cannon PASSES
Mar 31, 2026
cecb7b1
Fix smoke test threshold to scale with nproc
Mar 31, 2026
b20b82d
Fix gate_8gpu.sh: update path comment and step_avg pass criterion
Mar 31, 2026
310d5d1
Record smoke test result: 739ms/step is healthy on 1xH100
Mar 31, 2026
dfb8459
BWVPC 8GPU gate: DOES NOT PROMOTE — pyramid+cannon fails
Mar 31, 2026
50517e3
Update SCIENCE.md: close pyramid/cannon screw, reflect actual complet…
Mar 31, 2026
79d13ad
Two crawler legs: BW5_Cannon full run + BW6_Skipgram gate
Mar 31, 2026
f8fba27
Add BW6_Skipgram gate_8gpu.sh — 2000-step 8xH100 A/B gate
Mar 31, 2026
d28a2fb
Add PIPELINE.md — full ranked hypothesis queue, both tracks
Mar 31, 2026
c4ddf54
BW5_Cannon full run: DOES NOT PROMOTE — +0.00020 vs BW5 at 600s/8034 …
Mar 31, 2026
3c52f3a
Housekeeping: archive closed crawler legs, move QK_Gain_SLOT to neura…
Mar 31, 2026
e699814
BW6_Skipgram gate: null result — trigram neutral on crawler, −140KB s…
Mar 31, 2026
1293058
Archive BW6_Skipgram — null result, closed
Mar 31, 2026
9f84d6c
Update PIPELINE.md: close cannon/skipgram, fix paths, update neural t…
Mar 31, 2026
ba322b0
Fix SLOT backward crash + record run 1 results
Mar 31, 2026
06b4c2d
Add BW7 MegaGate: 8-arm ablation on 4xGPU
Mar 31, 2026
64992a0
Add BW7 MegaGate pod_setup.sh — fresh pod one-shot launcher
Mar 31, 2026
cc36ca6
Relax flash_attn preflight in BW7 MegaGate — warn not abort
Mar 31, 2026
0752174
Fix torchrun path: fallback to python3 -m torch.distributed.run
Mar 31, 2026
f225709
Update neural/SCIENCE.md with competitive intelligence + hypothesis r…
Mar 31, 2026
e2867c2
Add Arch+Sched Sweep: 6-case 4×GPU ablation (rope_32, bigram_4096, qa…
Mar 31, 2026
963b440
Expand sweep to 9 cases: add gptq, bigram_3072, warmdown_4k
Mar 31, 2026
1dc3a32
Sweep: gptq case reuses baseline checkpoint (SKIP_TRAIN=1 + LOAD_CHEC…
Apr 1, 2026
4c6ef06
Add SLOT legality analysis to neural/SCIENCE.md
Apr 1, 2026
479484a
Add BW8_Tap: shared encoder tap dim=32 — strongest MegaGate signal
Apr 1, 2026
3a93e20
Add QK_Gain_SLOT_Legal: context-only SLOT (legal causality-safe variant)
Apr 1, 2026
ef750ce
Add BW9_Anchor gate + update SCIENCE.md with MegaGate results
Apr 1, 2026
89a321e
Fix gate.sh: remove quotes from inline env var assignments
Apr 1, 2026
75840c7
Add BW10_GPTQ — loop-aware GPTQ gate on BW8 baseline
Apr 1, 2026
33a8144
Add gptq_full case: full training run with SKIP_GPTQ=0 (not post_only)
Apr 1, 2026
52cd457
Log Arch+Sched sweep results (seed 444, 4×GPU): all 9 cases dead
Apr 1, 2026
c0ceacd
Add Rascal_III_SLOT leg: context-only Legal SLOT on Rascal II base
Apr 1, 2026
b5e9e7c
Add BW11_5Flat — 5F+1C depth revisit on BW8 baseline
Apr 1, 2026
93ef50a
Rascal_III_SLOT run.sh: minimal racer, exact SOTA env + SLOT_ENABLED=1
Apr 1, 2026
5efb22b
BW10_GPTQ: gate PASS — −0.00486 int6_sw, step time clean
Apr 1, 2026
cb4f1c2
BW10_GPTQ: add production run.sh (8×H100, 600s, LOOP_AWARE_GPTQ=1)
Apr 1, 2026
dd2e06a
BW11_5Flat: add production run.sh (8×H100, 600s, NUM_FLAT_LAYERS=5)
Apr 1, 2026
0e428cd
Add RASCAL_WINDOWN_TESTING — 4-arm legal window strategy suite
Apr 1, 2026
2338fee
BW10_GPTQ: full run PROMOTES — 1.18292670 BPB, new champion
Apr 1, 2026
a70185a
Rascal_III_SLOT: surgical SLOT via hook, no model class changes
Apr 1, 2026
ef2c932
BW11_5Flat: full run PROMOTES — 1.17651313 BPB, new champion
Apr 1, 2026
6c864c1
Scaffold Crawler II submission — seed=444 pre-filled, seed=300 pending
Apr 1, 2026
385d704
Rename submission: Crawler II → Nightcrawler
Apr 1, 2026
7b9a11b
Nightcrawler: fill seed=300 results — 1.17490448 BPB, mean 1.1757
Apr 1, 2026
6f8e093
Nightcrawler: add seed logs; fix validate.sh set-e/arithmetic bug
Apr 1, 2026
bcd26f7
Nightcrawler: add seed=4 (1.17676091 BPB), update mean to 1.1761
Apr 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,14 @@ data/manifest.json
data/docs_selected.jsonl
.mypy_cache/
.venv
logs/
logs/
experiments/archive/checkpoints/

# Large binaries — never commit
*.pt
*.ptz
junkyard/results/
junkyard/checkpoints/
junkyard/experiments/archive/checkpoints/
junkyard/experiments/GreenRod_X_1/lab_protocol_20260327/research_hub_*/
junkyard/experiments/GreenRod_X_1/lab_protocol_20260327/vast_tests/
77 changes: 77 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Parameter Golf Lab — Agent Protocol

## Orient first
```
cat neural/LEADER.md # current neural SOTA
cat crawler/LEADER.md # current crawler SOTA
```
These two files tell you where the lab stands. Read them before doing anything.

## Repo structure
```
neural/ ← Neural SOTA track (Rascal lineage) — leaderboard #1 focus
crawler/ ← Crawler track (Bandit_Wagon lineage) — compression/quality focus
submissions/ ← Competition PR zone. Read submissions/PROTOCOL.md before touching.
vault/ ← Immutable locked sources. Never modify.
records/ ← Leaderboard submission records. Never modify.
scripts/ ← Shared runners. sota_now.sh is the neural baseline runner.
data/ ← Dataset. Never modify.
junkyard/ ← Legacy experiments. Read-only reference only.
```

## Hard rules

**NEVER overwrite a test file.** Always create a new file. If you need to modify
a training script, copy it first, work on the copy, name it clearly.

**Confirm names before creating.** Ask the user what to name a new leg, script,
or directory before creating it. Never invent names silently.

**ONE variable per test.** If a run changes more than one thing vs the baseline,
the result is uninterpretable and the money is gone.

**Gate before 8x.** Every hypothesis runs a 1-GPU 2000-step gate (~$0.50) before
an 8×H100 full run (~$3-4). Never skip the gate.

**Never submit from TEST_LAB.** Submissions go: `submissions/` zone only.
Read `submissions/PROTOCOL.md`. Run `bash submissions/validate.sh <records_dir>` first.
Branch flow: `submission/<name>` → push `fork1` → PR to `openai/parameter-golf`.

## RunPod workflow
1. Pod always pulls from `TEST_LAB` branch
2. Commit and push scripts BEFORE launching the pod
3. On pod: `git pull && bash <script>`
4. Never push FROM the pod
5. Pod gets destroyed after the run — save checkpoints before destroying

## Test cycle: Hypothesis → Ablation → Results

Every leg follows this sequence. No skipping steps.

```
hypothesis.md ← write FIRST. ONE variable. Why. Gate target.
train_gpt.py ← copy from leader, make the ONE change
gate.sh ← commit+push → pod pulls TEST_LAB → run (1-GPU, 2000 steps)
ablation.md ← fill gate result. Pass? Proceed. Fail? Stop.
run.sh ← commit+push → pod pulls TEST_LAB → run (8×H100, 600s, seed=444)
ablation.md ← fill full run result. Beats leader? Run confirmation.
confirmation run (8×H100, 600s, seed=300)
RESULTS.md ← verdict (PROMOTES / DOES NOT PROMOTE), what we learned, next hyp
```

New legs are scaffolded with all three files pre-created:
```bash
bash scripts/new_leg.sh neural <name>
bash scripts/new_leg.sh crawler <name>
```

## Seeds
- Primary: 444
- Confirmation: 300
- Never use 1337

## Cost
- 8×H100 SXM: ~$13.36/hr
- Full 10-min run: ~$3-4
- Gate (1-GPU, 2000 steps): ~$0.50
- Do not suggest a run without a validated gate or clear hypothesis
121 changes: 121 additions & 0 deletions LAB_PROTOCOL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Lab Protocol — Parameter Golf

_We are competing for #1. Every pod dollar is a decision._

---

## The One Rule

**ONE variable changes per test. If you change two, the result is meaningless and the money is gone.**

Before committing any gate script: diff it against the baseline. Count the differences. If it's more than one, stop.

---

## Pipeline: Gate → Full → Submit

```
Hypothesis
Single GPU gate (2000 steps)
↓ passes?
8×H100 full run (600s, seed=444)
↓ beats baseline?
8×H100 confirmation (seed=300)
↓ both seeds confirm?
Submission branch → PR
```

**Never skip the gate.** A 2000-step single GPU run costs ~$0.50. A full 8×H100 run costs ~$3-4. Skipping the gate to save 10 minutes has cost us runs.

**Never submit on one seed.** Seed variance is real. Two seeds confirming = it's real.

---

## Cost Discipline

- 8×H100 SXM: ~$1.67/hr per GPU = **$13.36/hr for 8×**
- Full 10-min run (with pod overhead): **~$3-4**
- Per-race budget: **~$15**
- Do not suggest a run without a validated gate result or a clear hypothesis

**Reproducing a score we already own = no.** Never re-run a baseline we control unless the architecture changed.

---

## Checkpoints

After every full run, `final_model.pt` gets copied to a unique name immediately:

```bash
cp final_model.pt checkpoints/EXP_s${SEED}_$(date +%Y%m%d_%H%M%S)_bpb${BPB}.pt
```

The pod gets destroyed. If the checkpoint isn't saved before that, it's gone.

---

## Script Standards

- Every experiment lives in `experiments/<Name>/`
- Every experiment has: `run.sh`, `gate.sh` or `gate_1gpu.sh`, `RESULTS.md`
- `run.sh` uses `train_gpt.py` from the same directory (symlink or copy)
- Scripts are committed and pushed before the pod fires
- Never paste raw commands. Always a `.sh` file.
- Log files go to `experiments/<Name>/results/` or `logs/`

---

## Naming

- Confirm experiment names before creating directories
- Active series: `Bandit_Wagon_V`, `Bandit_Wagon_V_Cannon`, etc.
- Superseded experiments → `experiments/archive/`
- Never reuse a name from a previous run

---

## SOTA Garage

Three active models:

| Track | Model | BPB | Size |
|-------|-------|-----|------|
| Neural | Rascal II | 1.10987 | 15.44MB |
| Crawler | BW5 seed=444 | 1.18672 | 8.61MB |
| Compression | FX_WING_DELTA | 0.2233 | — (model lost) |

**Submission branch protocol:**
1. Never submit from TEST_LAB
2. Create dedicated branch → push to Open-parameter-golf-1 fork → PR to openai/parameter-golf
3. Every PR needs: `submission.json`, logs, README with reproduce instructions

---

## Experimental Design

- Proxy deltas (500 steps, 1 GPU) inflate **5–15×** vs full run. Never promote from proxy alone.
- Gate (2000 steps, 1 GPU) is the minimum signal to trust.
- SWA kicks in at step ~7650. Results before that step are pre-SWA.
- Wallclock budget is 600s. Extra parameters cost convergence speed — account for this.
- `COMPILE_FULLGRAPH=1` is now baseline for all BW5+ experiments.

---

## Seeds

- Primary: **444**
- Confirmation: **300**
- Never use 1337 for new experiments.

---

## Submission Checklist

- [ ] Two seeds confirmed, both beat baseline
- [ ] `submission.json` present
- [ ] Logs committed
- [ ] README with reproduce instructions
- [ ] File size ≤ 16MB
- [ ] Score-first always (no training on val before scoring)
- [ ] Branch is NOT TEST_LAB
173 changes: 173 additions & 0 deletions PIPELINE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Lab Pipeline — Ranked Hypothesis Queue

**Updated:** 2026-03-31 (end of day)
**Crawler champion:** 1.18672385 BPB · 8.61MB · `crawler/2026-03-29_BW5/`
**Neural champion:** 1.10986874 BPB · 15.44MB · `neural/2026-03-30_Rascal_II/`

Ranked by estimated potential impact. One variable per test, always. Gate before 8x, always.

---

## TIER 1 — Leaderboard-threatening

These have claimed or theorized BPB deltas large enough to change standings.

### [NEURAL] SLOT — Sample-Specific Eval Adaptation
**Status:** Designed. Shelved pending torch 2.11 pod.
**Variable:** `SLOT_ENABLED=1` (eval-side only, training unchanged)
**Mechanism:** At eval time, for each sliding window batch: freeze hidden states, optimize a small additive delta for 8 AdamW steps on the LM loss, score with the adapted delta. Model weights never modified.
**Claimed delta:** ~−0.021 BPB (arXiv:2505.12392v2). If real, that's leaderboard-smashing.
**Legal:** Yes — score-first, self-supervised, no external labels.
**Cost:** ~$0.50 gate (1GPU, 2000 steps), ~$3-4 full run.
**Prerequisite:** torch 2.11 pod. Fix `experiments/QK_GAIN_SLOT_Gate/` REPO_ROOT first.
**Risk:** Proxy result may inflate. The claimed delta is from a different codebase.

---

### [NEURAL] QK_GAIN_INIT=4.0 — Sharper Initial Attention Focus
**Status:** Designed. Shelved pending torch 2.11 pod.
**Variable:** `QK_GAIN_INIT=4.0` (default 1.5). Zero code change.
**Mechanism:** `q_gain` per-head scalar initialized at 4.0. Model is free to train away — this is an init effect, not a constraint. Drives sharper early attention gradients.
**Claimed delta:** ~−0.006 BPB. Source: external, 45 runs across 3 codebases.
**Cost:** Included in existing SLOT gate. Two-for-one test.
**Prerequisite:** Same torch 2.11 pod as SLOT.
**Risk:** Init effects shrink as training progresses. May wash out at full run.

---

### [CRAWLER] Delta Anchor / Delta Farce (BDF series)
**Status:** Designed (memory). Not yet scripted.
**Variable:** Per-loop dynamic causal state vector at loop boundaries.
**Mechanism:** Battery (9,1,1) differentiates *reading* — each loop attends at a different causal horizon. Delta anchor completes the pair: differentiates *writing*. Each loop commits a small learned anchor state (dim ~32) for the next loop to condition on, instead of all loops writing blindly into the same residual stream. Extends FLOW (inst_dim) from static identity bias → dynamic per-loop time state.
**Why high impact:** The current crawler has no dynamic causal memory crossing loop boundaries. Loop 1 cannot distinguish what loop 0 extracted causally from what was already in the residual. This is the fundamental architectural gap. Battery addressed the attention side. This addresses the output side.
**Arm structure (BDF series):** BDF-00 control · BDF-01 anchor_dim=32 loop→loop · BDF-02 anchor_dim=64 · BDF-03 anchor_dim=32 symmetric · BDF-04 anchor_dim=32 + seeded from tap
**Estimated delta:** Unknown — first principled test of this gap. Could be small or could be step-change.
**Cost:** Gate ~$0.50 per arm, 4-5 arms.
**Prerequisite:** None on BW5. Can run now. Does not require cannon or skipgram to confirm first.
**Risk:** High complexity. Could introduce instability. Could also be that battery+residual already routes causality well enough.

---

### [CRAWLER] Tap (BWT series) — Per-Loop Gated Encoder Tap
**Status:** Designed (hypothesis in junkyard). Param already exists in BW5 (`CRAWLER_TAP_DIM`).
**Variable:** `CRAWLER_TAP_DIM=32` (default 0 = disabled). `CRAWLER_TAP_LOOP_SPECIFIC=1`.
**Mechanism:** Project intermediate encoder layer outputs (shallow + deep) once into a small tap_dim embedding. Each crawler loop injects a learned projection of these frozen encoder signals into its residual. Gives the crawler a stable, pre-quantization anchor to check against as it loops. The tap signal is computed once before looping — negligible overhead.
**Why high impact:** The crawler accumulates quantization error across 3 loops with no stable reference. FLOW is self-referential (tracks its own drift). The encoder tap provides an uncontaminated anchor — the pre-loop signal the crawler is supposed to be refining. This directly attacks the quant gap.
**Arm sweep:** tap_dim ∈ {16, 32, 64} · per-loop vs shared · shallow/deep/all encoder layers
**Estimated delta:** Medium-large. Quant gap reduction is the primary lever for int6_sw_bpb.
**Cost:** ~$3-4 full run after gate. Multiple gate arms.
**Prerequisite:** None — CRAWLER_TAP_DIM=0 is already the BW5 baseline.
**Risk:** Tap projection cost adds latency. Per-loop specificity adds params. May need careful tuning.

---

## TIER 2 — Clear signal, lower complexity

### [NEURAL] Trigram on Rascal II
**Status:** Code already in vault. Just needs `TRIGRAM=1` env var.
**Variable:** `TRIGRAM=1` (default 0). Zero extra parameters.
**Mechanism:** Trigram hash `(t-2, t-1, t)` into same 2048-slot bigram table. Neural SOTA already has the code — defaults off.
**Note:** Crawler trigram was null (recurrence approximates context). Neural SOTA is a standard transformer — trigram provides context NOT otherwise captured. Mechanism applies differently. Still worth gating.
**Cost:** ~$0.50 gate.
**Risk:** Low. If crawler null → neural may also be null, but different architecture means different expectation.

---

### [CRAWLER] Shared Flat Layer Weights
**Status:** Concept. Not scripted.
**Variable:** Cross-block weight tying in the 4 flat U-Net encoder/decoder layers.
**Mechanism:** The 4 flat layers (2 encoder + 2 decoder) currently have unique weights. If encoder layer pairs or encoder/decoder symmetric pairs share weights, that frees substantial parameter budget (~4M params at dim=512). Those freed params can be reinvested into the crawler block (wider MLP, deeper tap, etc.).
**H8 finding (neural track):** Weight-shared depth tested vs unique layers — crawler loops already demonstrate this pattern works. Flat layer sharing is unexplored on the current config.
**Estimated delta:** Unknown. Could be neutral (weight tying at this scale doesn't hurt) with free budget to reinvest, or could be negative.
**Cost:** ~$0.50 gate.
**Prerequisite:** None. Run as standalone gate vs BW5.
**Risk:** Medium. Flat layers serve distinct encoder/decoder roles; tying them may hurt.

---

### [CRAWLER] BW6_Skipgram (Trigram) — CLOSED
**Status:** ✗ Null result. Archived.
**Result:** +0.0005 raw / +0.00014 int6_sw — within variance noise. Speed neutral (−0.06ms). Size −140KB (interesting compression artifact). Crawler recurrence already approximates trigram context.

---

### [CRAWLER] Smear Gate
**Status:** Designed. Param already exists (`CRAWLER_LOOP_SMEAR=0`). Flip to 1.
**Variable:** `CRAWLER_LOOP_SMEAR=1`. ~512 learned scalars.
**Mechanism:** Learnable sigmoid blend between consecutive loop outputs (current loop output ↔ previous loop output). Zero matmuls — essentially free compute. Soft low-pass filter across loop depth. Damps quantization error amplification across loops (each loop re-processes the previous loop's error through the same int8 weights).
**Estimated delta:** Small. The error damping effect is real but limited.
**Cost:** ~$0.50 gate. Trivial.
**Prerequisite:** None. Add to BW5 gate.
**Risk:** Very low. Zero-init gate → sigmoid(0)=0.5, model learns direction. Warm start.

---

### [CRAWLER] BW5_Cannon Full Run — CLOSED
**Status:** ✗ Does not promote. Archived.
**Result:** 1.18692423 vs BW5 1.18672385 — +0.00020 worse. Gate signal reversed at scale. Size −179KB vs BW5 at full run (zstd quirk). Channel cannon (1.5K params) never tested at full run — future option if needed.

---

## TIER 3 — Refinement / cleanup

### [CRAWLER] Pyramid Small Choke (dim=128 or 256)
**Status:** Concept. Derived from pyramid failure post-mortem.
**Variable:** `CRAWLER_MLP_CHOKE_DIM=128` (or 256). Shape=pyramid.
**Why:** Pyramid failed at dim=512 because 1.57M cold params compound training burden over time. Smaller choke = less burden. The structural idea is not wrong — just needs a feasible parameter count.
**Estimated delta:** Small-medium if the concept holds at smaller scale.
**Additional variants:** Warm initialization of bottleneck weights; dedicated LR schedule for choke layers.
**Prerequisite:** None. But learn from pyramid failure — run gate at 2000 steps, not 500.

---

### [CRAWLER] XSA Coverage Sweep on BW5
**Status:** Concept. Pre-BW5 tests showed XSA coverage is a quant-robustness lever.
**Variable:** `XSA_LAST_N=13` or `=15` (current: 11, ceiling: 15 for 15-block model).
**Context:** XSA=11 was tuned pre-fullgraph. BW5's compile optimizations may have changed the step-time headroom. Full coverage (XSA=15) adds overhead but may return quant gap reduction.
**Estimated delta:** Small. Pre-BW5 the gain was real but marginal vs step cost.
**Risk:** Speed regression. Measure step_avg carefully.

---

### [CRAWLER] Warmdown Tuning
**Status:** Low priority. BW5 seed gap.
**Variable:** `WARMDOWN_ITERS` or LR taper shape.
**Why:** BW5 seed=300 is +0.00012 worse than Leg 3 seed=300. Mean is better but seed=300 doesn't individually confirm. Closing this gap makes the champion more robust.
**Estimated delta:** Tiny. This is seed-gap management, not a quality leap.

---

### [NEURAL] QAT Tuning
**Status:** Not started.
**Variables:** `LATE_QAT_THRESHOLD` (current 0.15), QAT start step (current ~6070).
**Why:** The quant gap (roundtrip vs sliding window) in Rascal II is ~0.001. Earlier/stronger QAT may tighten it. Risk: too-early QAT disrupts training.
**Estimated delta:** Small. Rascal II is already near-optimal.

---

### [NEURAL] Architecture Capacity
**Status:** Not started.
**Variables:** `BIGRAM_VOCAB_SIZE=4096`, `ROPE_DIMS=32`, extra XSA layer.
**Context:** 0.5MB headroom under 16MB cap. Any expansion risks the size gate.
**Estimated delta:** Unknown. High risk given tight size constraint.
**Prerequisite:** Any candidate must pass `bash submissions/validate.sh` size check.

---

## Combined / Downstream (after individual validation)

| Combo | Prerequisites | Notes |
|-------|--------------|-------|
| Cannon + Skipgram | Both individually promote | Two-variable test |
| Cannon + Smear | Both individually promote | Likely compatible |
| Tap + Delta Anchor | Both individually gate | Would be BW7+ architecture |
| SLOT + Trigram (neural) | Both individually gate | Eval + training enrichment |

---

## Shelved (needs environment fix)

| Experiment | Location | Blocker |
|-----------|----------|---------|
| QK_GAIN + SLOT gate | `neural/2026-03-31_QK_Gain_SLOT/` | Needs torch 2.11. REPO_ROOT path broken in run script — fix before running. |
| QK_SLOT (neural) | `junkyard/experiments/QK_SLOT_Ablation/` | Same torch 2.11 issue. Pod ran at 3358ms/step (4× slow). |
Loading