Add lowercase SP10240 QK 5.125 ablation by suryavanshi · Pull Request #1814 · openai/parameter-golf

suryavanshi · 2026-04-25T09:10:26Z

No description provided.

Phase J (one-time data prep, done): - train_sp10240_caseops.py: train SentencePiece BPE at vocab=10240 over CaseOps-transformed FineWeb. Reserves U+E001..U+E005 as user-defined symbols (matches PR openai#1729 / SP8192 reservation set). 96-worker, ~25 min. - prepare_caseops_data_parallel.py with --sp pointing at the new model produces SP10240 caseops shards (~27 GB). Uploaded to private HF dataset hf://FijaEE/parameter-golf-sp10240-caseops (1434 train + 5 val + 5 val_bytes shards). - Tokenizer model + vocab file committed under tokenizers/ for git clone. Phase K (TTT params budget tradeoff, ready to run): - runpod/phase_k_ttt_tradeoff.sh: train SP8192 V2 baseline once on 8xH100 (~10 min, saves model.bin), then run TTT_EVAL_ONLY=1 for 4 configs reusing the saved artifact: K0: grad=1 prefix=2000 phases=3 ctx=2048 (V2 baseline) K1: grad=2 prefix=2000 phases=3 ctx=2048 (oracle, expected over-budget) K2: grad=2 prefix=1500 phases=1 ctx=2048 (cut prefix+phases) K3: grad=2 prefix=2000 phases=3 ctx=1024 (cut ctx) Auto-picks the lowest-BPB config that fits 600s for Phase L. Phase L (3-seed combo, parametrized by Phase K winner): - runpod/phase_l_combo.sh: PR openai#1797 V2 stack + SP10240 + LoRA rank 96 + best TTT params from K. Runs 3 seeds (42, 314, 1234), reports Welch t-test vs PR openai#1797 (1.06157±0.00066) and the 0.005-nat record bar. Hypothesis (per user observation): vocab progression 1024→2048→4096→8192 has been monotonically beneficial; no one in the queue has tried sp10240 without PPM-D. PR openai#1814's lowercase-SP10240 single-seed (1.0742) suggests ~ -0.0015 BPB delta from vocab alone vs PR openai#1797's V2 SP8192 baseline (1.05998 seed-42). Combined with TTT 2-step bump (PR openai#1812 showed 4-epoch delivered -0.008 BPB on a different stack) and LoRA rank 96, total expected ~1.045-1.055 BPB if Phase K finds a feasible budget.

Add lowercase SP10240 QK 5.125 ablation

101679e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lowercase SP10240 QK 5.125 ablation#1814

Add lowercase SP10240 QK 5.125 ablation#1814
suryavanshi wants to merge 1 commit intoopenai:mainfrom
suryavanshi:codex-lowercase-qk5125-ablation

suryavanshi commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

suryavanshi commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant