Commit 1572115
spec 009: implement spinquant_hotstart.py (baseline + R_a-only modes)
Two new files in the openai#1736 submission dir:
spinquant_hotstart.py (~360 LOC):
- Imports from train_gpt.py for Hyperparameters/GPT/serialize/deserialize/
eval_val/eval_val_ttt_phased/BatchedTTTLoRA/etc.
- Modes: baseline, internal_only (R_a only, per-layer per-KV-group, d_head
rotation on V-output and O-input).
- full, port_1695 are stubs — raise NotImplementedError with explanation.
- Pipeline: load FP state_dict from HOTSTART_FP_CKPT -> apply rotations
in-place on banked qo_bank/kv_bank -> optional pre-quant diagnostic eval
-> call serialize() (GPTQ+compress) -> deserialize() -> quantized eval
-> phased TTT eval -> write final.json.
- Reproduces the TTT eval block from train_and_eval (lines 2997-3075) in
_run_ttt_eval() rather than refactoring the source file.
test_rotation_invariance.py (~250 LOC):
- CPU-only, standalone (no train_gpt.py import due to flash_attn_3/triton
module-level deps).
- Self-contained minimal attention forward: Q/K/V projection from the
banked tensors, RMSNorm on Q and K (matches real model's bound on
attention logits; without this, trained weights saturate softmax and
float noise in V amplifies catastrophically).
- Tests baseline (bit-exact identity) and internal_only (rel tolerance
1e-4) against either synthetic random weights or spec 008's
final_model.pt. Both pass cleanly (rel_max ~1e-6 on real checkpoint).
- Can load either banked (qo_bank/kv_bank) or unbanked
(blocks.N.attn.*.weight) state_dict format.
Spec 009 updated: reduced scope to 2 modes (baseline, internal_only) for
this session; full and port_1695 deferred. Rationale in the spec: MLP
LeakyReLU-squared breaks R_m float-invariance, resid_mix can't be cleanly
folded through RMSNorm, both needing design before implementation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent a552fba commit 1572115
3 files changed
Lines changed: 897 additions & 6 deletions
File tree
- records/track_10min_16mb/2026-04-19_SP8192_CaseOps_GatedAttn_QuantGate_Loop45_PhasedTTT
- research/specs
0 commit comments