Non-record: AR Self-Generated GPTQ Calibration (val_bpb=1.1461)#1234
Non-record: AR Self-Generated GPTQ Calibration (val_bpb=1.1461)#1234ibarrajo wants to merge 1 commit intoopenai:mainfrom
Conversation
Model generates its own GPTQ calibration data (64 seqs x 2048 tokens, temp=0.8) after training, eliminating need for training data at eval time. Built on Approach B base. The 390s training budget (vs 590s) to reserve time for AR generation loses more from fewer training steps than it gains from better-matched calibration distributions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Community Review — Non-record: AR Self-Generated GPTQ Calibration (val_bpb=1.1461)Compliance: LOOKS CLEAN — legal score-first-per-chunk TTT (PR #1413 pattern) PR #1234 — AR Self-Gen GPTQ + Int6 + XSA + TTT Head SHA: 8c3f820 File audited:
|
Summary
Results
Delta: +0.028 BPB vs baseline — self-gen GPTQ loses net.
Analysis: Why Self-Gen GPTQ Loses
The technique requires reserving ~210s for AR generation + Hessian collection, leaving only 390s for training (vs 590s baseline). This loses ~30% of training steps. While self-generated calibration data better matches the model's inference-time activation distribution, the quantization improvement (~0.002-0.003 BPB) is far smaller than the loss from fewer training steps (~0.03 BPB). The technique would become net positive if:
Key Changes from Approach B
generate_autoregressive_calib()— generates 64 sequences of 2048 tokens at temp=0.8collect_hessians_from_tokens()— collects H = X^T X from self-generated sequencesArchitecture
Rule Compliance
Test Plan
🤖 Generated with Claude Code