Record: val_bpb: 1.14020 [tested 3x on 8xh100]#267
Record: val_bpb: 1.14020 [tested 3x on 8xh100]#267andrewgcodes wants to merge 17 commits intoopenai:mainfrom
Conversation
…its) Combines techniques from PR openai#162, openai#180, openai#267, openai#281: - 11-layer GPT with U-Net skip connections, GQA - SmearGate + BigramHash(10240) - Mixed int5/int6 quantization + 3% magnitude pruning - Causal TTT at eval time - SWA(frac=0.4), WD=0.042, Z-loss - Target: sub-1.135 val_bpb Awaiting RunPod 8xH100 credits for 3-seed validation.
Community Review — Record: val_bpb: 1.14020 [tested 3x on 8xh100]Compliance: LOOKS CLEAN — legal score-first-per-chunk TTT (PR #1413 pattern) PR #267 — "Record: val_bpb: 1.14020 [tested 3x on 8xh100]" Check 1: N-gram family bug (CLOSE trigger)CLEAN. out[..., 1:] = torch.bitwise_xor(36313 * t[..., 1:], 27191 * t[..., :-1]) % modThe hash key for position Check 2: Pre-Quant TTT (CLOSE trigger)CLEAN. The TTT optimizer is Check 3: Legal TTT (CLEAN)CONFIRMED LEGAL. The causal TTT loop (lines 1446–1529) follows strict score-first-per-chunk ordering: Note: The sliding-window clamping logic is novel — specifically how Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks and a quick look at the sliding-window clamping math. Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually. |
Flagging that this is doing TTT during Val but compliantly. @0hq
I believe these make it allowed: