Skip to content

Commit e587d76

Browse files
joey00072claude
andcommitted
PR openai#180 SOTA: 10L Int5-MLP + BigramHash(10240) + SWA(0.4) + WD=0.04
Reproduce openai/parameter-golf PR openai#180 (val_bpb 1.14276, 3-seed mean). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 0f51451 commit e587d76

3 files changed

Lines changed: 397 additions & 267 deletions

File tree

local.sh

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#!/bin/bash
2+
set -e
3+
4+
# PR #180 local 1-GPU test run
5+
# Reduces batch tokens to fit a single GPU (65536 vs 786432 for 8xH100)
6+
# GRAD_ACCUM logic: 786432 / 8 GPUs = 98304 tokens/step → approximate with 65536
7+
8+
TRAIN_BATCH_TOKENS=65536 \
9+
TRAIN_SEQ_LEN=1024 \
10+
MAX_WALLCLOCK_SECONDS=120 \
11+
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
12+
torchrun --standalone --nproc_per_node=1 train_gpt.py

run.sh

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/bin/bash
2+
set -e
3+
4+
# PR #180 reproduction — 10L Int5-MLP + BigramHash(10240) + SWA(0.4) + WD=0.04
5+
# 3-seed mean val_bpb: 1.14276 | Best seed: 1.14260 (seed 2024)
6+
# All hyperparameters are defaults in train_gpt.py — no env vars needed.
7+
8+
# Run with default seed (42):
9+
# bash run.sh
10+
# Run with specific seed:
11+
# SEED=1337 bash run.sh
12+
13+
.venv/bin/torchrun --standalone --nproc_per_node=8 train_gpt.py

0 commit comments

Comments
 (0)