Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB) by ibarrajo · Pull Request #1001 · openai/parameter-golf

ibarrajo · 2026-03-28T03:58:32Z

Summary

Three approaches tested, all rule-compliant. Best legal result: 1.1188 BPB (s_0 TTT only).

Previous PR #991 was closed because TTT re-scored tokens after training. This submission reports only the legal s_0 score. All GPTQ calibration runs within 600s training budget.

Approach	val_bpb	Notes
A (#569 VRL+GPTQ, int5, no TTT)	1.1317	int5 penalty on d=512
B (#576 d=576 int5, no TTT)	1.1249	Strong base
B + legal s_0 TTT	1.1188	Score-first only, no re-eval
C (GEPA int5 + TTT)	N/A	Artifact 16.3MB over limit

Lessons learned

TTT re-scoring is illegal — only cumulative s_0 from first pass counts
int5 penalty on d=512: +0.014 BPB vs int6
Legal s_0 TTT: -0.006 BPB improvement
GPTQ must be within 600s training budget — our script reserves time and asserts

Rule compliance

GPTQ calibration within training budget (assert: train+gptq < 600s)
Artifact < 16MB (assert in code)
Eval < 600s (assert in code)
TTT reports s_0 only — NO re-scoring after training
No val tokens in artifact

Based on PRs #569, #576, #505. Submitted as non-record data points.

🤖 Generated with Claude Code

Approach A (openai#569 int5 no TTT): 1.1317 — int5 penalty too high on d=512 Approach B (openai#576 d=576 int5 + legal s_0 TTT): 1.1188 — best legal result Approach C (GEPA int5 + TTT): artifact over 16MB Key lesson: TTT re-scoring is illegal (PR openai#991 closed for this). Only s_0 cumulative first-pass score is legal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatoTeziTanka · 2026-04-12T13:15:04Z

Community Review — Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)

Compliance: LOOKS CLEAN — legal score-first-per-chunk TTT (PR #1413 pattern)

BigramHash legal, score-first TTT with is_last_chunk guard. Clean.

Verdict: LOOKS CLEAN — legal TTT implementation matching the PR #1413 (dexhunter) pattern: each chunk scored under torch.no_grad() before optimizer.step(), with is_last_chunk guard preventing adaptation on the final scored chunk.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). TTT implementation follows the legal score-first discipline.

Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)#1001

Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)#1001
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:submission/non-record-approaches

ibarrajo commented Mar 28, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ibarrajo commented Mar 28, 2026

Summary

Lessons learned

Rule compliance

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — Non-record: Three Approaches + Lessons Learned (best: 1.1188 BPB)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants