val_bpb 1.1099 (3-seed mean) Rascal by newjordan · Pull Request #1120 · openai/parameter-golf

newjordan · 2026-03-30T04:57:49Z

Rascal — Junkyard Rat Rascal II

11L XSA-all + Parallel Muon + Coprime loader + Bigram2048 + RoPE16 + SWA + Late QAT. No GPTQ — naive int6 embed + 5 layers, zstd-compressed to ~15.5MB.

val_bpb: 1.1099 (3-seed mean)

Seed	val_bpb
42	1.11018163
300	1.10979099
444	1.10986874
mean	1.1099

Hardware: 8×H100 SXM
Size: 15,554,053 bytes (~15.5MB)
26.99M parameters, 600s wallclock

A representation of the neural model:

3D cubric pattern recognizer (54 warm-started adaptive multipliers) + complementary training. Seeds: 1337=0.4818, 300=0.4821, 58=0.4821. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three variants targeting the 0.187 BPB gap to #1: - bwing_alpha: clip 0.95, alpha 0.05-0.60 (isolate alpha curve) - bwing_entropy_shift: per-order entropy center shift (isolate) - bwing_full_port: all openai#809 techniques + fixed order mults (fire first) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Cubric 3D back online (CADENCE=32, warm-start) - Per-order entropy center shift from openai#809 - Alpha 0.05-0.60, clip 0.95 - Our sliding-window TTT spliced in (1 epoch, SGD, freeze 2 blocks) - TTT runs BEFORE n-gram eval → adapted model feeds n-gram Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Port openai#809 LoRA TTT: rank-8 adapters on Q/V/LM head, AdamW, Polyak - Add LoRA injection to CausalSelfAttention, Block, GPT forward paths - 53s vs our old 410s TTT, 6x better BPB gain - Cubric 3D ON + entropy shift + alpha 0.05-0.60 clip 0.95 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fixed mults + entropy shift + alpha 0.05-0.60 clip 0.95 (no cubric). Base sliding: 1.1194, n-gram9: 0.4512. Delta from X-WING: -0.031. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Deleted LoRA TTT abomination. bwing_III is now a clean copy of our best scoring variant for further iteration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bwing_IV: Prime fix only — adds primes 283721, 347237 to eliminate XOR hash collisions for orders 8-9 (the 2.0x multiplier orders). With 7 primes, prime[7] wrapped to prime[0], causing context tokens at positions j-8 and j-1 to cancel when equal. bwing_V: Prime fix + cubric 3D stacked on top of fixed mults. Cubric warm-starts at 1.0 (neutral) and refines per (order × entropy × count) on top of the fixed order multiplier scaling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adapted from old setup.sh. Fixes FA3 detection (old one skipped FA3 when FA2 was present), uses sp1024 dataset, adds zstandard install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Standalone eval script loads final_model.int6.ptz once, then sweeps: - alpha_max: [0.50, 0.60, 0.70, 0.80] - entropy_center: [2.0, 2.5, 3.0] - high_order_mult: [1.5, 2.0, 2.5, 3.0] - min_count: [1, 2] - cubric: [on, off] = 192 configs, ~3 min each, sorted by aggressiveness (best-first). Results to sweep_results.csv. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

openai#809 uses INT5 — more aggressive quantization creates more entropy in the post-quant model, letting n-gram eval rescue harder. Their quant loss is 0.019 vs our 0.006 (INT6), but n-gram extracts 0.869 vs 0.668. Changes from bwing_IV: - clip_range: 31 → 15 in gptq_quantize_weight, quantize_int6_per_row, and _find_best_row_scales - No cubric (it hurt in bwing_V) - 9 hash primes (from bwing_IV) - All openai#809 n-gram params (fixed mults, entropy shift, alpha curve) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Clean submission-ready code. 2140 → 1936 lines (-204). Removed all dead code paths that aren't used in our config. INT5 GPTQ + 9-prime hash fix remain as the key changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

A-Wing Green (INT5 GPTQ + 9-prime): - Post-quant sliding: 1.1410 (vs 1.1194 INT6) - N-gram reduction: 0.683 (vs 0.668 INT6 — +0.015 more) - Final: 0.4576 BPB — worse than SOTA by 0.006 - Conclusion: INT5 quant noise hurts more than n-gram gains bwing_V (9-prime + cubric stacked on fixed mults): - Final: 0.4601 BPB — cubric on top of fixed mults HURTS by 0.009 - Cubric over-corrected (orders 2-3 suppressed to 0.62x on top of 0.3x) SOTA remains bwing_full_port at 0.4512 BPB (INT6, fixed mults, no cubric). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Instead of entropy-adaptive alpha (blind proxy), compare actual model_p vs ngram_p per token. Soft sigmoid on log-ratio: alpha = 0.95 * sigmoid(8 * log(ngram_p / model_p)) When ngram_p > model_p: alpha → 0.95 (trust n-gram) When ngram_p < model_p: alpha → 0.0 (trust model) No wasted mixing on tokens where n-gram is worse. Base: SOTA bwing_full_port + 9-prime hash fix. INT6, no cubric. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

openai#809 trains for 525s, leaving 75s for GPTQ. We were using the full 600s default. 570s leaves 30s for GPTQ calibrate (3.4s) + quantize (~25s) with headroom. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- run.sh now checks zstandard + flash_attn BEFORE training starts - Fails fast if zstandard missing (prevents 17MB zlib artifacts) - Shows FA version for debugging - train_gpt.py warns loudly if falling back to zlib Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Green_1 scored 0.3200 BPB with oracle alpha alone. Green_2 adds LoRA TTT to close the remaining 0.025 gap to openai#809 (0.2952). TTT flow (score-first legal): 1. Sliding window eval scores all val tokens (frozen model) 2. LoRA rank-8 adapters injected on Q, V projections 3. Single pass over val tokens: score then adapt (AdamW, lr=3e-4) 4. Polyak averaging (decay=0.998) for stability 5. N-gram eval with oracle alpha on adapted model Coarse stride (16x) keeps TTT under 60s. Total eval budget: ~290s. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Rewrote setup_runpod.sh to install FA3 + zstandard directly into the default system env instead of creating a separate conda environment that conflicts with torchrun and per-test scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

A-Wing Green_1 seed 1337 = 0.3200 BPB (was 0.4512). Oracle alpha = sigmoid(8 * log(ngram_p/model_p)) * 0.95. Copies: red, purple for parallel experimentation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds Linear(512→12) alpha_head trained jointly with model to predict per-token expert weights (neural + 11 n-gram orders 2-12). Training oracle prefilled from training data, eval uses backward-looking val-data cache. Targets sub-0.15 BPB on our 1.1195 neural baseline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Usage on fresh pod: bash experiments/pod_launch.sh experiments/A_wing/purple/run.sh Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add pod_setup.sh: one file, zero args, sets up pod environment - Move stale root dirs to experiments/archive/ organized by type - Update pod_launch.sh default branch to test - Gitignore checkpoints (too large for GitHub) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

New experiment: test whether weight-shared Frugendorff architecture compresses model artifact while maintaining BPB when paired with the full X-WING N-gram eval stack (3D cubric, shared tables, CT, orders 2-9). - train_gpt.py: adds CrawlerGPT class alongside existing GPT; USE_CRAWLER=1 switches to 4 flat + 1 shared×2 architecture; build_model() factory handles both; all N-gram/GPTQ/CT machinery unchanged and legal - Green/run.sh: 0.25 scale validator (1 GPU, 150s, dim=384) - Red/run.sh: full scale production (8×H100, 600s, USE_CRAWLER=1) - Purple/run.sh: U-Net control (8×H100, 600s, USE_CRAWLER=0) for clean A/B Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…0.9984, std=0.1724 Seeds: 42 (0.8104 SW), 300 (0.9578 SW), 1337 (1.2269 SW). Includes unravel A/B diagnostic scripts from Medusa_II (all variants tied at 1.0047 — checkpoint-level fragility, not GPTQ config). DeltaNet heads introduce significant cross-seed variance vs ClownCar (0.00015). Successor to PR openai#990, catalyzed by PR openai#875. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ock cap PR openai#1028 (Medusa_IV) flagged by judges: GPTQ calibration read training data after stopping_early at 600s, violating eval-phase data access rules. Fix: GPTQ_RESERVE_MS=30000 causes training loop to stop ~30s early so GPTQ calibration (~12s) completes within the 600s budget. Log now prints elapsed time at GPTQ start for reviewer verification. Two-line change to wallclock check (effective_max_wallclock_ms), plus timing log. All hyperparameters identical to Medusa_IV. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix DeltaNet cross-loop state carry (causality violation): state from loop N encoded all 0..T-1 tokens, leaking future info into loop N+1. Now each loop calls chunk_delta_rule with initial_state=None (zero). Explains the RT < SW anomaly seen in Medusa_IV results. - Fix prefill_shard header offset in both oracle classes: skipped the 256×int32 shard header, ingesting garbage as tokens into hash tables. Matches load_data_shard. Inactive currently but correct for future use. - DELTA_NET_HEADS overridable for clean ablation: DELTA_NET_HEADS=0 SEED=300 bash experiments/Medusa_VII/run.sh Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

DN=0: SW 1.1823 (honest baseline, SW<RT confirmed) DN=4 fixed: SW 1.1958 (EMA-starved, wash vs DN=0) Causality fix confirmed: SW<RT on both runs. 0.9578 score was entirely from DeltaNet look-ahead violation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Combines Medusa_VII causality-fixed crawler (DN=0, EMA+GPTQ) with X-WING's ngram9 eval stack: shared tables, 3D Cubric 54-cell warm-start, entropy-adaptive alpha 0.20-0.75, COMPLEMENT_ALPHA=0.5. All code already present in Medusa_VII train_gpt.py — purely a run.sh change. Baseline: X-WING flat 0.4818 BPB. Target: beat it with stronger base model. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Training loop now stops 30s early so GPTQ calibration (~12s) completes within the 600s budget. Same fix applied to Medusa_Legal_unstable. Logs gptq:starting elapsed for reviewer verification. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Frugendorff ClownCar crawler (4 flat + 1 crawlerx4 loops, inst_dim=32, DN=0, causality-fixed) + X-WING ngram oracle (shared tables, 3D Cubric 54-cell warm-start, entropy-adaptive alpha 0.20-0.75, COMPLEMENT_ALPHA=0.5). 3-seed results: s4=0.4964, s444=0.4957, s300=0.4961, mean=0.4961 std=0.0003 SW BPB ~1.187, GPTQ-int6+zstd ~9.2MB, 8xH100 SXM. GPTQ_RESERVE_MS=30000 ensures calibration completes within 600s budget. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- SKIP_GPTQ=1: no 30s reserve, full wallclock restored (~1.1091 target) - int6_cats adds "embed": tok_emb quantized int6 not int8, ZSTD saves ~1.5-2MB - Expected artifact: ~14.5-15MB (vs 16.73MB on Rascal I) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

SKIP_GPTQ=1 + embed int6 → full 600s training + legal compression. DO NOT MODIFY this entry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Safe copy created after the original was overwritten by an agent run. MD5-verified identical to the run that produced 0.2233 BPB ngram9. Use this for re-runs — do not modify. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- XSA on all 11 layers (xsa_last_n: 4 → 11, from Rascal PR openai#1120) - SLOT: per-batch δ∈ℝ⁵¹² at last hidden layer, 5 AdamW steps lr=0.003 - ResidLambdas: learnable per-sublayer scaling, √1.1 init, 5× scalar_lr - Warmdown shortened 3500 → 2000 steps - QAT global flag fix (torch.compile constant-folding bug) - SWA actually applied fix (was silently skipped) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Key innovations over previous submission (1.1195, PR openai#529): 1. **Parallel Muon Optimizer** — Parameter banking with async reduce-scatter/ all-gather overlapping Newton-Schulz orthogonalization. 3-phase training loop: (1) launch async RS for banks, (2) all-reduce + Adam step for replicated params (overlaps with RS), (3) wait RS, NS5, async AG. Eliminates DDP wrapper entirely. From PR openai#1120 (Rascal/Cambrian). 2. **INT5 Quantization (clip_range=15)** — 31 unique integer levels instead of 63 (INT6). Combined with GPTQ Hessian-aware error compensation, achieves ~0.476 bytes/param compression ratio vs ~0.64 for INT6. Enables fitting a larger model (MHA 8/8, MLP 3.5x, BigramHash 6144, ~32M unique params) under the 16MB artifact limit. 3. **Coprime Stride Data Loader** — Deterministic permutation-free sampling using coprime strides over memory-mapped shards. Each shard is traversed via stride coprime to block count, guaranteeing full coverage without storing permutation arrays. Adaptive shard selection with power-law weighting (alpha decays 0.9→0.5 over training). 4. **Wallclock-Adaptive LR Schedule** — LR warmdown triggers based on elapsed wallclock time rather than step count. Automatically adapts to varying step times across hardware, ensuring consistent convergence regardless of system performance. 5. **MHA 8/8 + MLP 3.5x + BigramHash 6144** — Larger architecture than previous submissions (was GQA 8/4, MLP 3.0, BigramHash 2048). Full multi-head attention, wider MLP, richer bigram hash embeddings. Only possible due to INT5 compression. Architecture: 11L, dim=512, MHA 8/8, MLP 3.5x (1792), LeakyReLU²(0.5), XSA all 11 layers, partial RoPE 16/64, LN scale 1/√(L+1), SmearGate, OrthoInit, BigramHash 6144, Shared VE128 (layers 9,10), U-Net skip connections, EMA 0.997, Tight SWA (every 50), Late QAT (threshold 0.15), Muon lr=0.025 WD=0.04 (momentum warmup 0.92→0.99 over 1500 steps) Training: 94ms/step → ~6333 steps in 600s wallclock on 8×H100 SXM Quantization: INT5 GPTQ (clip_range=15, block_size=64, 256-sample calibration) + 2% magnitude pruning + zstd-22 compression Eval: Sliding window (stride=64) + Legal score-first AdamW TTT (5 epochs, lr=0.0001, last 2 blocks + norms + head unfrozen, 262144-token chunks) 3-seed results: Seed 1337: 1.1144 BPB (16.12 MB artifact) Seed 42: 1.1141 BPB (15.12 MB artifact) Seed 7: 1.1150 BPB (15.26 MB artifact) Mean: 1.1145 BPB (std 0.0005)

Ran the submitted train_gpt.py (commit 39ed402) with SKIP_GPTQ=1 on GCP 8xH100. Result: final_sliding_window_exact val_bpb 1.11350 vs published 1.10979 (seed 300). Gap: +0.00371 BPP — 7x larger than typical seed variance (~0.0005). Note: train_gpt.py contains no quantization code; the published int6+zstd metrics appear to come from an external runner.

… script The 2159-line rascal_master (no quantization) was mistakenly committed to records/ instead of the 2468-line script that produced the submission logs. The correct file includes int6+zstd quantization, GPTQ skeleton, and zstandard compression — matching bytes_code=118521 reported in submission.json and logs. Addresses reproducibility concern raised in PR openai#1177. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…bytes) Replaces previously incorrect file. Vault copy confirmed by re-run on cu128 pod: Code size 118521, step_avg 90.62ms, val_bpb 1.10993484. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… default to 1 PR openai#1120 train_gpt.py verbatim except line 135: default baked to 1 (not 4). Matches the env override in the original SOTA run.sh so harness picks up correct loader behavior without a wrapper. run.sh also pins =1 explicitly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…oadmap Full leaderboard analysis (2026-03-31): we hold best legal open PR (openai#1120 at 1.10987). Only PR openai#1089 (1.1091) beats us — by 0.00077 BPB. Stack audit of Rascal II: LeakyReLU²/LN-scale/XSA-all already present. GPTQ code exists but SKIP_GPTQ=1. Warmdown 3500 vs leaders' 4000. BigramHash 2048 vs leaders' 3072. zstd-22 vs Brotli-11. Adds 4 research threads with prioritized hypothesis queue: 1. Rascal_III_GPTQ (biggest gap, code already in script) 2. Rascal_III_ARcal (self-gen calibration after GPTQ confirmed) 3. Rascal_III_Bigram3072 (vocab coverage, +~50KB) 4. Rascal_III_Warmdown4k + Brotli/minify Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

newjordan · 2026-04-02T16:05:45Z

I was working too fast and Sonnet did a lab fire. Took me 30+ hours to fix and 200$. TLDR opus went out, and sonnet (without me knowing there was a downgrade) - went through all of my nueral sota files, marked them like a dog with bad information. and completely polluted my DB, as well as re-uploaded testing and research to old PR. But I have delved the depths of hell, and spent my resources to ensure my work is at least defensible. Installing screenpipe now..

Normally I am pushing three research legs at once, but this took me down to 1.5 mainly fixing the neural sota... and minor ablations on the crawler. worst 48 hours of comp so far.

cocohearts · 2026-04-23T22:03:42Z

Submission-format issue: this PR is not records-folder-only. The diff adds a large experiments/... tree and other files outside records/track_10min_16mb/ (238 outside-record files in the file list). The Rascal record folder has useful README/submission/code/logs, but the PR should be trimmed to only the intended records/track_10min_16mb/2026-03-30_Rascal_8xH100/ submission folder for review.

Octavian and others added 30 commits March 26, 2026 00:23

X-WING 3D Cubric: 0.4820 BPB (3-seed mean, std 0.0002)

4ce0d59

3D cubric pattern recognizer (54 warm-started adaptive multipliers) + complementary training. Seeds: 1337=0.4818, 300=0.4821, 58=0.4821. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Record bwing_full_port seed 1337: 0.4512 BPB

137432f

Fixed mults + entropy shift + alpha 0.05-0.60 clip 0.95 (no cubric). Base sliding: 1.1194, n-gram9: 0.4512. Delta from X-WING: -0.031. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace bwing_III with copy of SOTA bwing_full_port (0.4512 BPB)

94bb107

Deleted LoRA TTT abomination. bwing_III is now a clean copy of our best scoring variant for further iteration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add B-wing pod setup script (FA3 + zstandard + sp1024)

3ebaf38

Adapted from old setup.sh. Fixes FA3 detection (old one skipped FA3 when FA2 was present), uses sp1024 dataset, adds zstandard install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Green_1: cap training at 570s to fit GPTQ in 600s budget

08d6b7c

openai#809 trains for 525s, leaving 75s for GPTQ. We were using the full 600s default. 570s leaves 30s for GPTQ calibrate (3.4s) + quantize (~25s) with headroom. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

NEW SOTA 0.3200 BPB: A-Wing Green_1 Oracle Alpha + 9-Prime

5876cf5

A-Wing Green_1 seed 1337 = 0.3200 BPB (was 0.4512). Oracle alpha = sigmoid(8 * log(ngram_p/model_p)) * 0.95. Copies: red, purple for parallel experimentation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add pod_launch.sh: one command for clone + setup + run

2b38218

Usage on fresh pod: bash experiments/pod_launch.sh experiments/A_wing/purple/run.sh Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix pod_launch.sh: pull from private repo (fork1), not public

a37d7c3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Purple: reduce prefill to 20 shards (~2B tokens), restore 570s cap

6004ac7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix pod_setup.sh: workspace path is /workspace/parameter-golf

db300a0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix REPO_DIR depth in F_Wing run scripts (3 levels up, not 2)

473a4b7

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add A-wing RED mixer variant with bounded distributed prefill

5e8ec28

Add A-wing RED_G GPU monster mixer path and tune RED

4a06a37

Fix DDP warmup by including mixer supervision in RED variants

3cedb3f

records: add A-WING RED_G seed1337 run summary

005cdc5

Octavian and others added 12 commits March 28, 2026 11:03

Log JR-03 fused MLP result as loser (with Triton-node caveat)

e6d11d8

Crawler_Leg_1: add run_all.sh sequencer for all 11 ablation arms

1a8501a

SOTA: Rascal II — new best legal submission 1.10986874 BPB, 15.44MB

f1ce7c9

SKIP_GPTQ=1 + embed int6 → full 600s training + legal compression. DO NOT MODIFY this entry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Record: Rascal — val_bpb 1.1099 (3-seed mean)

39ed402

newjordan changed the title ~~Record: Rascal — val_bpb 1.1099 (3-seed mean)~~ val_bpb 1.1099 (3-seed mean) Rascal Mar 30, 2026

notapplica mentioned this pull request Mar 30, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

EthanYangTW mentioned this pull request Mar 31, 2026

1.1145 BPB: Parallel Muon + INT5 GPTQ + Legal TTT #1171

Open

dexhunter mentioned this pull request Mar 31, 2026

Non-record: SLOT + Split-LR + Full GPTQ + XSA-all — val_bpb 1.1015 (3-seed mean) #1172

Open

dexhunter mentioned this pull request Mar 31, 2026

review: Rerun of PR #1120 (Rascal) on 8xH100 SXM #1177

Closed

Octavian and others added 2 commits March 31, 2026 11:19

Fix submission train_gpt.py to vault-verified file (0ec1f462, 118521 …

e5c909f

…bytes) Replaces previously incorrect file. Vault copy confirmed by re-run on cu128 pod: Code size 118521, step_avg 90.62ms, val_bpb 1.10993484. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix train_gpt.py to actual runfile

d6631cc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cocohearts added the record submission ready for review label Apr 23, 2026

cocohearts mentioned this pull request Apr 24, 2026

Update README leaderboard with recent record submissions #1806

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

val_bpb 1.1099 (3-seed mean) Rascal#1120

val_bpb 1.1099 (3-seed mean) Rascal#1120
newjordan wants to merge 141 commits intoopenai:mainfrom
newjordan:submission/rascal

newjordan commented Mar 30, 2026 •

edited

Loading

Uh oh!

newjordan commented Apr 2, 2026 •

edited

Loading

Uh oh!

cocohearts commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

newjordan commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rascal — Junkyard Rat Rascal II

Uh oh!

newjordan commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cocohearts commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

newjordan commented Mar 30, 2026 •

edited

Loading

newjordan commented Apr 2, 2026 •

edited

Loading