Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
141 commits
Select commit Hold shift + click to select a range
4ce0d59
X-WING 3D Cubric: 0.4820 BPB (3-seed mean, std 0.0002)
Mar 26, 2026
6c49da3
B-wing lab: port PR #809 n-gram techniques onto X-WING base
Mar 26, 2026
bee0716
B-wing II: cubric ON + entropy shift + fast TTT
Mar 26, 2026
d6d281a
B-wing III: LoRA TTT from #809 + cubric ON + all n-gram fixes
Mar 26, 2026
137432f
Record bwing_full_port seed 1337: 0.4512 BPB
Mar 26, 2026
94bb107
Replace bwing_III with copy of SOTA bwing_full_port (0.4512 BPB)
Mar 26, 2026
2c0c0ee
B-wing IV + V: fix 7→9 hash primes (order 8-9 collision bug)
Mar 26, 2026
3ebaf38
Add B-wing pod setup script (FA3 + zstandard + sp1024)
Mar 26, 2026
5a21365
Add n-gram parameter grid sweep for bwing_V
Mar 26, 2026
75dbe40
A-Wing Green: INT5 GPTQ (clip_range=15) + 9-prime hash fix
Mar 26, 2026
22eae2a
A-Wing Green: strip TTT, cubric, F1 correction, distillation
Mar 26, 2026
d6cb709
Record results: A-Wing Green 0.4576, bwing_V 0.4601
Mar 26, 2026
c37a8ab
A-Wing Green_1: Oracle Alpha — use model_p vs ngram_p directly
Mar 26, 2026
08d6b7c
Green_1: cap training at 570s to fit GPTQ in 600s budget
Mar 26, 2026
d8b6022
Green_1: add preflight checks (zstd, FA3) + zstd import warning
Mar 26, 2026
b1d45b8
A-Wing Green_2: Oracle Alpha + LoRA TTT + 9-Prime
Mar 26, 2026
88ec4ca
Fix pod setup: use system Python, no conda/PYTHONPATH hacks
Mar 26, 2026
5876cf5
NEW SOTA 0.3200 BPB: A-Wing Green_1 Oracle Alpha + 9-Prime
Mar 26, 2026
da832ba
A-Wing Purple: Learned Mixer Head for legal n-gram ceiling
Mar 26, 2026
2b38218
Add pod_launch.sh: one command for clone + setup + run
Mar 26, 2026
a37d7c3
Fix pod_launch.sh: pull from private repo (fork1), not public
Mar 26, 2026
6004ac7
Purple: reduce prefill to 20 shards (~2B tokens), restore 570s cap
Mar 26, 2026
230dfc6
Clean up repo: single pod_setup.sh, archive stale dirs
Mar 26, 2026
db300a0
Fix pod_setup.sh: workspace path is /workspace/parameter-golf
Mar 26, 2026
2a92a77
F-Wing: Frugendorff + X-WING N-gram combined concept
Mar 26, 2026
473a4b7
Fix REPO_DIR depth in F_Wing run scripts (3 levels up, not 2)
Mar 26, 2026
5e8ec28
Add A-wing RED mixer variant with bounded distributed prefill
Mar 26, 2026
4a06a37
Add A-wing RED_G GPU monster mixer path and tune RED
Mar 26, 2026
3cedb3f
Fix DDP warmup by including mixer supervision in RED variants
Mar 26, 2026
005cdc5
records: add A-WING RED_G seed1337 run summary
Mar 26, 2026
4a4be33
F-Wing: rebase train_gpt.py onto A_wing/RED (add CrawlerGPT + mixer s…
Mar 26, 2026
f09a6e5
RED_G: fix ngram blend-mode conflicts and wire order-aware eval controls
Mar 26, 2026
abe72f0
F-Wing: fix CrawlerGPT torch.compile compatibility
Mar 26, 2026
a76dda4
Add A-Wing green_3: width bump to model_dim=640
Mar 26, 2026
5e27afc
Add A-Wing green_1A: legal alpha + PR#609 improvements
Mar 27, 2026
aa0a156
Optimize green_1A selective pruning: fast zstd-1 for binary search
Mar 27, 2026
411dea1
Add Cobra base-quality 10min harness plan and tooling
Mar 27, 2026
3b4b821
Add pod_setup_cobra bootstrap script
Mar 27, 2026
90741b4
Rat Rod Green: Parallel Muon base + GPTQ stripped for pure base model…
Mar 27, 2026
e32f32b
Rat Rod Green v2: kill late QAT + enable trigram
Mar 27, 2026
ec7ab9f
Rat Rod v3: MTP_NUM_HEADS=2, revert trigram (v2 was a wash)
Mar 27, 2026
05d3990
Rat Rod v3: MTP_NUM_HEADS=2 experiment (separate dir, green untouched)
Mar 27, 2026
fd4fb31
A/B test: ROPE_DIMS 16 vs 24 (200s quick runs)
Mar 27, 2026
4d58515
Log A/B test result: ROPE_DIMS=16 control @ 200s
Mar 27, 2026
2479ced
Rat Rod v4: HS-MTP — hash-space multi-token prediction
Mar 27, 2026
aefe581
Rat Rod v4: add CPU n-gram bridge for HS-MTP weighting
Mar 27, 2026
fc73010
Fix: set _hsmtp_w during warmup phase (torch.compile NoneType crash)
Mar 27, 2026
5a6a771
Document Synapse system (HS-MTP + CPU N-gram Bridge) in PROGRESS.md
Mar 27, 2026
b14ef45
A/B test: VALUE_RESIDUAL 0 vs 1 (200s quick runs)
Mar 27, 2026
b7e9a07
Rat Rod v5: Synapse v2 — GPU-native hash bridge (<1ms/step)
Mar 27, 2026
e11298f
Freeze SOTA: A_WING_GREEN base 1.1129 ngram 0.4489 (2026-03-27)
Mar 27, 2026
7d2b520
Add SOTA folder README: only add, never delete or modify
Mar 27, 2026
8ec6a61
Fix: remove step counter that breaks torch.compile fullgraph
Mar 27, 2026
e3ae59a
Log v4/v5 Synapse + VALUE_RESIDUAL results — all dead
Mar 27, 2026
6ae55ed
Rat_Rod_Purple_1: training oracle + Dirichlet mixing + matrix_lr=0.03
Mar 27, 2026
c48c060
Purple_1: disable training oracle by default (legally gray)
Mar 27, 2026
f8caa0c
Rat Rod: add zero-cost H100 sweeps and robust trainer toggles
Mar 27, 2026
9e826d9
Purple_1: phrase cache + regime tracker + warmdown=2000 + chunk=65K
Mar 27, 2026
c185a8d
Add Siphon: ensemble-objective training + WARMDOWN2000 SOTA entry
Mar 27, 2026
63c27e1
FX-Wing: Instructed Recurrence — content-derived loop instructions fo…
Mar 27, 2026
4ab4ced
FX-Wing: add hypothesis and ablation plan
Mar 27, 2026
7a81eec
Reorganize: move master runner to experiments/Biology_concepts/run_al…
Mar 27, 2026
95e9333
Add green v6 (optimized SOTA): v1 + WARMDOWN_ITERS=2000
Mar 27, 2026
5268082
Add Biology Concepts sweep findings — tornado vs baseline analysis
Mar 27, 2026
516e2c8
Add green v7: v6 + COMPLEMENT_ALPHA=0.5
Mar 27, 2026
9a58d14
FX-Wing: fix compile — COMPILE_FULLGRAPH=0 for crawler loop
Mar 27, 2026
15c66fc
FX-Wing: CRAWLER_LOOPS=4 — exploit weight-sharing compression
Mar 27, 2026
909901e
Log v7 results: COMPLEMENT_ALPHA=0.5 worse than v1
Mar 27, 2026
812599d
FX-Wing: CRAWLER_QUANT_INT8 — int8 precision for shared crawler block
Mar 27, 2026
c641e5e
Add vast_fxwing_single.sh — single GPU FX-Wing launcher for Vast.ai
Mar 27, 2026
ce5e317
Add Cambrian: DeltaNet × Biology Concepts architecture
Mar 27, 2026
38479b9
Cambrian-0: GatedDeltaNet × Bio Seam architecture skeleton
Mar 27, 2026
2df9c72
Fix bio concept scripts: make MAX_WALLCLOCK_SECONDS env-overridable
Mar 27, 2026
8b93705
Cambrian-1: Add four bio seam controllers (Myelin, Circadian, Clonal,…
Mar 27, 2026
b0776f1
FX-Wing micro: device-flexible concept test for GB10 Blackwell DGX Spark
Mar 27, 2026
da80af1
FX-Wing: add DeltaNet associative memory to crawler reservoir
Mar 27, 2026
fa21139
FX-Wing micro: add -u flag for unbuffered stdout through tee pipe
Mar 27, 2026
531f98f
vast: blacklist offer 33510639 (103.42.50.244 — SSH never connects)
Mar 27, 2026
ff7069b
FX-Wing DeltaNet: disable compile on forward to prevent T-loop OOM
Mar 27, 2026
36845e3
FX-Wing run.sh: DELTA_NET_HEADS=0 for core concept test
Mar 27, 2026
fa4c218
FX-Wing: suppress inductor NaN in RoPE bounds analysis (PyTorch 2.4 bug)
Mar 27, 2026
b4968be
Cambrian: disable torch.compile on GatedDeltaNet.forward
Mar 27, 2026
f74175d
Cambrian run.sh: set COMPILE_FULLGRAPH=0
Mar 27, 2026
cea3b8b
Fix astrocyte gate shape bug: view(B,1,1) not unsqueeze(1).unsqueeze(2)
Mar 27, 2026
5fac1c8
GreenRod X_1: Hybrid DeltaNet + Attention engine
Mar 27, 2026
b55a421
Cambrian: forward PYTORCH_CUDA_ALLOC_CONF to torchrun (expandable_seg…
Mar 27, 2026
15714f9
Cambrian: remove @torch.compiler.disable from GDN.forward
Mar 27, 2026
24dd550
FX_Wing_Delta: flow instructions + DeltaNet + hypothesis
Mar 27, 2026
0b2164d
Cambrian: restore @torch.compiler.disable, default wallclock 600s
Mar 27, 2026
9c34b42
FX_Wing_Sigma: n-gram entropy as smoothing reference hypothesis
Mar 27, 2026
0c623c7
Add Cambrian bio seam sweep script
Mar 27, 2026
96bc2b4
FX_Wing_Delta: disable DeltaNet for flow-only test, add inductor patch
Mar 27, 2026
3adddb0
FX_Wing_Delta_DN: DeltaNet with gradient checkpointing + truncated BPTT
Mar 27, 2026
7b5e09c
Fix Cambrian bio sweep hang: SKIP_FINAL_EVAL=1 + process cleanup
Mar 27, 2026
c7ffeec
Deprecate FX_Wing* experiments; add FA_Wing_Green_1 gitignore
Mar 27, 2026
c9600c7
Add Cambrian agent instructions for Vast.ai sweep
Mar 27, 2026
03f9838
Add FA_Wing_GreenDN_1 (flow instructions + DeltaNet); gitignore both …
Mar 27, 2026
7c197c7
Add FA_Wing_Green_1 and FA_Wing_GreenDN_1 experiment code
Mar 27, 2026
0a89f4a
Fix REPO_ROOT depth in FA_Wing run.sh files (../.. not ../../..)
Mar 27, 2026
8037fce
Fix DDP unused-params crash: disable VE in FA_Wing crawler runs
Mar 27, 2026
3651d35
Add ClownCar experiment; restore FX_Wing_Delta from deprecated
Mar 27, 2026
f2a4f5f
ClownCar: disable ngram eval — sliding window baseline only
Mar 27, 2026
5ae2be5
Add ClownCar_II: canonical FLA DeltaNet + Crawler symbiotic pairing
Mar 27, 2026
ba4a2a7
Fix ClownCar/II run.sh: add missing crawler flags (USE_CRAWLER=1 etc.)
Mar 27, 2026
87ad173
ClownCar_II: add FLA ops preflight check to confirm canonical kernel …
Mar 28, 2026
e3ba281
Fix ClownCar_II: cast q/k/v/beta to x.dtype before chunk_delta_rule
Mar 28, 2026
c0cf2ac
Add ClownCar_IV: GPTQ bypass + state dtype fix
Mar 28, 2026
5d9e0b2
Fix ClownCar_IV: revert state dtype cast — only change is SKIP_GPTQ=1
Mar 28, 2026
a7d53c8
ClownCar_IV: SKIP_GPTQ only — restored from known-good e3ba281
Mar 28, 2026
baceb10
ClownCar_IV: reset to ClownCar_II base + EMA_DECAY=0.99
Mar 28, 2026
e587c91
ClownCar_IV: remove GPTQ, use naive int6
Mar 28, 2026
c262086
Add ClownCar_VI and Medusa: skip EMA + naive int6
Mar 28, 2026
07a57bf
pod_setup: add fla + attr install for DeltaNet
Mar 28, 2026
d9db34d
Add ClownCar_VII: loop-aware 2-phase GPTQ + no EMA
Mar 28, 2026
cc06d3b
Medusa: sync to ClownCar_VII (loop-aware GPTQ + no EMA)
Mar 28, 2026
ebc4b84
Add Medusa_II: late-start EMA (step 4400) + loop-aware GPTQ
Mar 28, 2026
4aa704b
Medusa_II: add short exit-only unravel A/B harness
Mar 28, 2026
9d1be62
Add Medusa_IV: copy of Medusa_III (winning 1.0366 config)
Mar 28, 2026
4b1c51c
Medusa_II: force finish-only A/B and add one-command launcher
Mar 28, 2026
d2f47e2
Add Medusa_V: fix state dtype cast (new_state.to(dtype))
Mar 28, 2026
0c38323
Medusa_II: add additional-only unravel check runner
Mar 28, 2026
d74538f
Add Medusa_V_SOTAMAXX: frozen SOTA config snapshot
Mar 28, 2026
9fa4fec
Add Medusa_VI: DeltaNet projections → CastedLinear for QAT coverage
Mar 28, 2026
0ce12a6
Records: fill Medusa_IV known results (seeds 300, 1337)
Mar 28, 2026
a4a5447
Records: Medusa Unstable README with known results
Mar 28, 2026
5f731b3
Records: Medusa_IV 3-seed complete — seed 42=0.8104 BPB (best), mean=…
Mar 28, 2026
79f45ae
Add Medusa_Legal_unstable: fix GPTQ training-data access after wallcl…
Mar 28, 2026
556b2fc
Medusa_VII: causality fix + shard header fix + DeltaNet ablation
Mar 29, 2026
3e09695
Medusa_VII: add ablation results
Mar 29, 2026
f74b9c9
Bandit: ClownCar crawler + X-WING ngram oracle
Mar 29, 2026
3a75282
Bandit: fix GPTQ wallclock violation (GPTQ_RESERVE_MS=30s)
Mar 29, 2026
4efa746
Bandit: ClownCar Crawler x Cubric Ngram9 — 0.4961 BPB (3-seed mean)
Mar 29, 2026
e6d11d8
Log JR-03 fused MLP result as loser (with Triton-node caveat)
Mar 30, 2026
1a8501a
Crawler_Leg_1: add run_all.sh sequencer for all 11 ablation arms
Mar 30, 2026
946f0a7
Rascal II: skip GPTQ + embed int6 — full 600s, target <16MB
Mar 30, 2026
f1ce7c9
SOTA: Rascal II — new best legal submission 1.10986874 BPB, 15.44MB
Mar 30, 2026
39ed402
Record: Rascal — val_bpb 1.1099 (3-seed mean)
Mar 30, 2026
cacef5f
Add FX_Wing_Delta_safe: byte-identical backup of FX_Wing_Delta
Mar 30, 2026
99b790d
Fix records/ train_gpt.py: replace placeholder with actual submission…
Mar 31, 2026
e5c909f
Fix submission train_gpt.py to vault-verified file (0ec1f462, 118521 …
Mar 31, 2026
d6631cc
Fix train_gpt.py to actual runfile
Apr 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ data/manifest.json
data/docs_selected.jsonl
.mypy_cache/
.venv
logs/
logs/
experiments/archive/checkpoints/
112 changes: 112 additions & 0 deletions experiments/A_wing/RED/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
#!/bin/bash
set -euo pipefail
# A-WING RED_G: Mixer-first, startup-bounded variant.
# Keeps learned mixer head, but bounds prefill and uses distributed sync
# so setup doesn't dominate runtime.

SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd -- "${SCRIPT_DIR}/../../.." && pwd)"
cd "${REPO_ROOT}"
export PYTHONPATH="${REPO_ROOT}/flash-attention/hopper:${PYTHONPATH:-}"

SEED="${SEED:-1337}"
NPROC_PER_NODE="${NPROC_PER_NODE:-8}"
: "${MAX_WALLCLOCK_SECONDS:=570}"

# 10-minute eval budgeting (training and eval are separate challenge caps).
: "${EVAL_BUDGET_SECONDS:=600}"
: "${EVAL_FIXED_OVERHEAD_SECONDS:=150}"
: "${EVAL_SAFETY_MARGIN_SECONDS:=45}"
DEFAULT_NGRAM_MAX_SECONDS=$((EVAL_BUDGET_SECONDS - EVAL_FIXED_OVERHEAD_SECONDS - EVAL_SAFETY_MARGIN_SECONDS))
if (( DEFAULT_NGRAM_MAX_SECONDS < 60 )); then
DEFAULT_NGRAM_MAX_SECONDS=60
fi
: "${NGRAM_EVAL_MAX_SECONDS:=${DEFAULT_NGRAM_MAX_SECONDS}}"
: "${NGRAM_EVAL_BUCKETS:=16777216}"
: "${NGRAM_CHUNK_TOKENS:=1048576}"

# Mixer prefill controls (training-oracle build time).
: "${MIXER_BUCKETS:=2097152}"
: "${MIXER_N_ORDERS:=8}" # orders 2..9
: "${MIXER_PREFILL_MAX_SHARDS:=80}"
: "${MIXER_PREFILL_MAX_SECONDS:=90}"
: "${MIXER_PREFILL_MIN_SHARDS:=4}"
: "${MIXER_PREFILL_TOKENS_PER_SHARD:=50000000}"
: "${MIXER_GPU_MODE:=1}"
: "${MIXER_PREFILL_POS_CHUNK:=1000000}"

: "${COMPILE_FULLGRAPH:=0}"

# --- Pre-flight checks ---
echo "[preflight] checking zstandard..."
python3 -c "import zstandard; print(f' zstandard {zstandard.__version__} OK')" 2>/dev/null \
|| { echo " FATAL: zstandard not found. pip install zstandard"; exit 1; }

echo "[preflight] checking flash_attn..."
python3 -c "
try:
import flash_attn_interface; print(' FA3 (hopper) OK')
except ImportError:
import flash_attn; v=flash_attn.__version__
if v.startswith('3'): print(f' FA3 v{v} OK')
else: print(f' WARNING: FA{v[0]} detected — want FA3')
" 2>/dev/null || echo " WARNING: no flash_attn found"

echo "============================================"
echo " A-WING RED_G — GPU Monster Mixer"
echo " Seed: ${SEED}"
echo " Mixer: Linear(512→$((MIXER_N_ORDERS + 1))) orders 2..$((MIXER_N_ORDERS + 1))"
echo " Mixer prefill: <=${MIXER_PREFILL_MAX_SECONDS}s, min_shards=${MIXER_PREFILL_MIN_SHARDS}, max_shards=${MIXER_PREFILL_MAX_SHARDS}"
echo " Mixer buckets: ${MIXER_BUCKETS}, tokens/shard cap: ${MIXER_PREFILL_TOKENS_PER_SHARD}, gpu_mode=${MIXER_GPU_MODE}"
echo " Eval buckets: ${NGRAM_EVAL_BUCKETS}, ngram eval cap: ${NGRAM_EVAL_MAX_SECONDS}s"
echo " Training cap: ${MAX_WALLCLOCK_SECONDS}s"
echo "============================================"

SEED="$SEED" \
F1_CORR_RANK=0 \
DISTILL_ENABLED=0 \
MLP_ACT=leaky_relu_sq \
MLP_LEAKY_SLOPE=0.5 \
XSA_LAST_N=4 \
BIGRAM_VOCAB_SIZE=1536 \
TTT_EVAL_ENABLED=0 \
ROPE_DIMS=24 \
VAL_LOSS_EVERY=20000 \
TRAIN_LOG_EVERY=1000 \
SWA_EVERY=100 \
COMPLEMENT_ALPHA=0.5 \
MIXER_ENABLED=1 \
MIXER_N_ORDERS="${MIXER_N_ORDERS}" \
MIXER_LOSS_WEIGHT=0.1 \
MIXER_NEURAL_FLOOR=0.05 \
MIXER_BUCKETS="${MIXER_BUCKETS}" \
MIXER_PREFILL_MAX_SHARDS="${MIXER_PREFILL_MAX_SHARDS}" \
MIXER_PREFILL_MAX_SECONDS="${MIXER_PREFILL_MAX_SECONDS}" \
MIXER_PREFILL_MIN_SHARDS="${MIXER_PREFILL_MIN_SHARDS}" \
MIXER_PREFILL_TOKENS_PER_SHARD="${MIXER_PREFILL_TOKENS_PER_SHARD}" \
MIXER_GPU_MODE="${MIXER_GPU_MODE}" \
MIXER_PREFILL_POS_CHUNK="${MIXER_PREFILL_POS_CHUNK}" \
NGRAM_EVAL_ORDER=9 \
NGRAM_EVAL_MIN_ORDER=2 \
NGRAM_EVAL_ADAPTIVE=1 \
NGRAM_EVAL_ALPHA=0.30 \
NGRAM_EVAL_ALPHA_MIN=0.05 \
NGRAM_EVAL_ALPHA_MAX=0.60 \
NGRAM_EVAL_ENTROPY_CENTER=3.0 \
NGRAM_EVAL_ENTROPY_SCALE=2.0 \
NGRAM_EVAL_MIN_COUNT=2 \
NGRAM_EVAL_BUCKETS="${NGRAM_EVAL_BUCKETS}" \
NGRAM_EVAL_MAX_SECONDS="${NGRAM_EVAL_MAX_SECONDS}" \
CUBRIC_CADENCE=0 \
NGRAM_ENTROPY_SHIFT=1 \
NGRAM_ORDER_MULTS="" \
NGRAM_CHUNK_TOKENS="${NGRAM_CHUNK_TOKENS}" \
MAX_WALLCLOCK_SECONDS="${MAX_WALLCLOCK_SECONDS}" \
COMPILE_FULLGRAPH="${COMPILE_FULLGRAPH}" \
torchrun --standalone --nproc_per_node="${NPROC_PER_NODE}" \
"${SCRIPT_DIR}/train_gpt.py" \
2>&1 | tee "logs/awing_redg_gpu_mixer_s${SEED}_$(date +%Y%m%d_%H%M%S).log"

echo "============================================"
echo " DONE"
echo "============================================"
Loading