Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)#1001
Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)#1001ibarrajo wants to merge 1 commit intoopenai:mainfrom
Conversation
Approach A (openai#569 int5 no TTT): 1.1317 — int5 penalty too high on d=512 Approach B (openai#576 d=576 int5 + legal s_0 TTT): 1.1188 — best legal result Approach C (GEPA int5 + TTT): artifact over 16MB Key lesson: TTT re-scoring is illegal (PR openai#991 closed for this). Only s_0 cumulative first-pass score is legal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Community Review — Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)Compliance: LOOKS CLEAN — legal score-first-per-chunk TTT (PR #1413 pattern) BigramHash legal, score-first TTT with is_last_chunk guard. Clean. Verdict: LOOKS CLEAN — legal TTT implementation matching the PR #1413 (dexhunter) pattern: each chunk scored under Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). TTT implementation follows the legal score-first discipline. Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually. |
Summary
Three approaches tested, all rule-compliant. Best legal result: 1.1188 BPB (s_0 TTT only).
Previous PR #991 was closed because TTT re-scored tokens after training. This submission reports only the legal s_0 score. All GPTQ calibration runs within 600s training budget.
Lessons learned
Rule compliance
Based on PRs #569, #576, #505. Submitted as non-record data points.
🤖 Generated with Claude Code