Skip to content

Add lowercase SP10240 QK 5.125 ablation#1814

Open
suryavanshi wants to merge 1 commit intoopenai:mainfrom
suryavanshi:codex-lowercase-qk5125-ablation
Open

Add lowercase SP10240 QK 5.125 ablation#1814
suryavanshi wants to merge 1 commit intoopenai:mainfrom
suryavanshi:codex-lowercase-qk5125-ablation

Conversation

@suryavanshi
Copy link
Copy Markdown

No description provided.

Fija pushed a commit to Fija/parameter-golf that referenced this pull request Apr 28, 2026
Phase J (one-time data prep, done):
- train_sp10240_caseops.py: train SentencePiece BPE at vocab=10240 over
  CaseOps-transformed FineWeb. Reserves U+E001..U+E005 as user-defined
  symbols (matches PR openai#1729 / SP8192 reservation set). 96-worker, ~25 min.
- prepare_caseops_data_parallel.py with --sp pointing at the new model
  produces SP10240 caseops shards (~27 GB). Uploaded to private HF
  dataset hf://FijaEE/parameter-golf-sp10240-caseops (1434 train + 5 val
  + 5 val_bytes shards).
- Tokenizer model + vocab file committed under tokenizers/ for git clone.

Phase K (TTT params budget tradeoff, ready to run):
- runpod/phase_k_ttt_tradeoff.sh: train SP8192 V2 baseline once on 8xH100
  (~10 min, saves model.bin), then run TTT_EVAL_ONLY=1 for 4 configs
  reusing the saved artifact:
    K0: grad=1 prefix=2000 phases=3 ctx=2048   (V2 baseline)
    K1: grad=2 prefix=2000 phases=3 ctx=2048   (oracle, expected over-budget)
    K2: grad=2 prefix=1500 phases=1 ctx=2048   (cut prefix+phases)
    K3: grad=2 prefix=2000 phases=3 ctx=1024   (cut ctx)
  Auto-picks the lowest-BPB config that fits 600s for Phase L.

Phase L (3-seed combo, parametrized by Phase K winner):
- runpod/phase_l_combo.sh: PR openai#1797 V2 stack + SP10240 + LoRA rank 96 +
  best TTT params from K. Runs 3 seeds (42, 314, 1234), reports Welch
  t-test vs PR openai#1797 (1.06157±0.00066) and the 0.005-nat record bar.

Hypothesis (per user observation): vocab progression 1024→2048→4096→8192
has been monotonically beneficial; no one in the queue has tried sp10240
without PPM-D. PR openai#1814's lowercase-SP10240 single-seed (1.0742) suggests
~ -0.0015 BPB delta from vocab alone vs PR openai#1797's V2 SP8192 baseline
(1.05998 seed-42). Combined with TTT 2-step bump (PR openai#1812 showed 4-epoch
delivered -0.008 BPB on a different stack) and LoRA rank 96, total
expected ~1.045-1.055 BPB if Phase K finds a feasible budget.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant