Skip to content

Commit e13f5a4

Browse files
committed
Add adaptive eval-time context non-record MLX submission
1 parent 954a158 commit e13f5a4

5 files changed

Lines changed: 4100 additions & 0 deletions

File tree

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
This folder captures a non-record local MLX submission for adaptive eval-time context.
2+
3+
The idea in this snapshot is simple: do one coarse pass over the validation stream, mark the harder windows from that pass, then rescore only those windows with a finer stride. The training setup stays close to the baseline MLX path; the change is in how the final roundtrip evaluation spends extra context.
4+
5+
This is not a leaderboard claim. It is a local Apple Silicon result meant to document the idea, the code snapshot, and a same-setup comparison against standard final evaluation.
6+
7+
Configuration:
8+
- Hardware: Apple M4 Pro, 48 GB unified memory
9+
- Track: non-record, local Apple Silicon MLX
10+
- Tokenizer/data: `fineweb10B_sp1024`, first train shard, first `32768` validation tokens
11+
- Model: SP-1024, `9x512`, `KV4`, tied embeddings
12+
- Training length: `200` iterations, `8192` train tokens/step
13+
- Final eval mode: adaptive
14+
- Adaptive eval settings: `coarse_stride=256`, `fine_stride=64`, `hard_fraction=0.25`
15+
16+
Command used for the included adaptive run:
17+
```bash
18+
cd records/track_non_record_16mb/2026-03-19_AdaptiveEvalContext_MLX_M4Pro_sp1024_200it
19+
RUN_ID=cmp200_adapt_c256_f64_h025 \
20+
SEED=1337 \
21+
ITERATIONS=200 \
22+
TRAIN_BATCH_TOKENS=8192 \
23+
GRAD_ACCUM_STEPS=8 \
24+
VAL_LOSS_EVERY=0 \
25+
VAL_BATCH_SIZE=32768 \
26+
VAL_MAX_TOKENS=32768 \
27+
FINAL_ROUNDTRIP_EVAL=1 \
28+
FINAL_EVAL_MODE=adaptive \
29+
FINAL_EVAL_COARSE_STRIDE=256 \
30+
FINAL_EVAL_FINE_STRIDE=64 \
31+
FINAL_EVAL_HARD_FRACTION=0.25 \
32+
FINAL_EVAL_BATCH_SEQS=16 \
33+
DATA_PATH=../../../data/datasets/fineweb10B_sp1024 \
34+
TOKENIZER_PATH=../../../data/tokenizers/fineweb_1024_bpe.model \
35+
../../../.venv/bin/python train_gpt.py > train.log 2>&1
36+
```
37+
38+
Included result (`train.log`):
39+
- Pre-quant eval at stop: `val_loss:4.1575`, `val_bpb:2.4070`
40+
- Post-roundtrip eval: `val_loss:4.15029331`, `val_bpb:2.40284524`
41+
- Eval time for final adaptive roundtrip pass: `2386ms`
42+
- Selected windows: `hard_windows:31/124`, `fine_windows:124`
43+
- Serialized model int8+zlib: `11239210 bytes`
44+
- Code size: `58701 bytes`
45+
- Total submission size int8+zlib: `11297911 bytes`
46+
47+
Same-setup reference (`compare_standard.log`):
48+
- Standard final eval: `val_loss:4.16789573`, `val_bpb:2.41303630`
49+
- Eval time: `321ms`
50+
51+
So in this local fixed-step proxy, the adaptive pass improves the final roundtrip score by about `0.01019 bpb` over the same setup with standard final evaluation, but it also increases final eval time. That tradeoff is the main reason this is being submitted as a non-record WIP rather than as a score claim.
52+
53+
Included files:
54+
- `train_gpt.py` - exact MLX code snapshot used for the run
55+
- `train.log` - adaptive local run log
56+
- `compare_standard.log` - same-setup standard-eval comparison log
57+
- `submission.json` - metadata for the run

0 commit comments

Comments
 (0)