Non-record: 11L PR315 Backout + Native FA3 RunPod (val_bpb=1.1247) by greqone · Pull Request #394 · openai/parameter-golf

greqone · 2026-03-22T03:45:25Z

Summary

add a non-record 10-minute-track submission folder for a faithful RunPod 8xH100 SXM PR315-style run plus Backout
include the exact training log, self-contained train_gpt.py, requirements.txt, submission.json, and README
package this as a non-record entry because the current live public frontier is already slightly below this score and this submission does not include a significance set for a new record claim

Result

exact sliding-window metric: val_bpb = 1.12467423
exact sliding-window loss: 1.89896029
total artifact bytes in this packaged folder: 15,545,662
hardware: 8xH100 SXM on RunPod with native Hopper FlashAttention and torch.compile

Notes

the original experiment used a sibling flash_attn_interface.py; for this submission folder that helper is inlined into train_gpt.py so the package is self-contained and closer to the repo guidance that counted code should live in train_gpt.py
this is intentionally filed under records/track_non_record_16mb/...

Copilot

Pull request overview

Adds a new non-record 10-minute / 16MB artifact-cap submission folder under records/track_non_record_16mb, packaging a self-contained train_gpt.py snapshot plus run artifacts for an 8xH100 SXM (RunPod) run using native FlashAttention (FA3) and torch.compile.

Changes:

Add a self-contained training script (train_gpt.py) with inlined FlashAttention interface logic, Backout residual, and sliding-window evaluation.
Include exact run artifacts (train.log) and metadata (submission.json) for the reported val_bpb=1.12467423.
Add reproducibility notes (README.md) and a minimal dependency list (requirements.txt).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
records/track_non_record_16mb/2026-03-22_11L_PR315_Backout_FA3_RunPod/train_gpt.py	Self-contained training + export + int6 quant + sliding-window eval script for the submission run.
records/track_non_record_16mb/2026-03-22_11L_PR315_Backout_FA3_RunPod/train.log	Captured training/eval log for the submitted run.
records/track_non_record_16mb/2026-03-22_11L_PR315_Backout_FA3_RunPod/submission.json	Leaderboard-style metadata for the non-record entry.
records/track_non_record_16mb/2026-03-22_11L_PR315_Backout_FA3_RunPod/requirements.txt	Dependencies needed to reproduce locally (per repo guidance).
records/track_non_record_16mb/2026-03-22_11L_PR315_Backout_FA3_RunPod/README.md	Run description, artifact accounting, and reproduction command.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T03:48:55Z

+        (int(k.split(".")[1]) for k in state_dict if k.startswith("blocks.")),
+        default=0,
+    ) + 1
+    late_k_layers = set(range(num_layers_total - 2, num_layers_total))


late_k_layers is computed but never used, which makes the quantization logic harder to follow and suggests a partially removed feature. Consider deleting it or wiring it into the intended “late-K passthrough” behavior so the code matches the stated design.

Suggested change

late_k_layers = set(range(num_layers_total - 2, num_layers_total))

Copilot · 2026-03-22T03:48:56Z

+train_gpt_submit.py — Submission v2: wider MLP + STE int6 QAT + MTP + seq2048 + NTK RoPE +
+fp16 embed + late-K passthrough + sliding window eval.


The module docstring refers to train_gpt_submit.py and lists features (e.g., “fp16 embed”, “MTP”) that don’t clearly match the actual defaults/implementation in this file. This can be confusing when someone audits the submission; consider updating the docstring to reflect the actual filename and the concrete features enabled in this snapshot (or remove the feature list).

Suggested change

train_gpt_submit.py — Submission v2: wider MLP + STE int6 QAT + MTP + seq2048 + NTK RoPE +

fp16 embed + late-K passthrough + sliding window eval.

Training script for GPT models used in parameter-golf submissions.

This module's behavior is defined by the hyperparameters and options below; refer to

the code and configuration flags instead of this docstring for an authoritative list

of enabled features.

Copilot · 2026-03-22T03:48:56Z

+def _load_system_flash_attn_interface():
+    for entry in sys.path:
+        if not entry:
+            continue
+        try:
+            resolved = Path(entry).resolve()
+        except OSError:
+            continue
+        candidate = resolved / "flash_attn_interface.py"
+        if not candidate.exists() or candidate.resolve() == here:
+            continue
+        if repo_root in candidate.resolve().parents:
+            continue
+        spec = importlib.util.spec_from_file_location("_system_flash_attn_interface", candidate)
+        if spec is None or spec.loader is None:
+            continue
+        module = importlib.util.module_from_spec(spec)
+        sys.modules[spec.name] = module
+        spec.loader.exec_module(module)
+        fn = getattr(module, "flash_attn_func", None)
+        if callable(fn):
+            return fn
+    return None


_load_system_flash_attn_interface() dynamically locates and executes an arbitrary flash_attn_interface.py from sys.path. This is a code-execution footgun (and can make runs non-reproducible if sys.path differs). Consider removing this path-walk entirely, or gating it behind an explicit env var that points to a known file and validating it’s in an expected location (e.g., site-packages) before importing.

Copilot · 2026-03-22T03:48:56Z

+        except OSError:
+            continue
+        candidate = resolved / "flash_attn_interface.py"
+        if not candidate.exists() or candidate.resolve() == here:


In _load_system_flash_attn_interface, the check candidate.resolve() == here will never be true because candidate is flash_attn_interface.py while here is train_gpt.py. If the intent is to avoid importing a repo-local helper, consider removing this condition (the subsequent repo_root parent check already covers it) or comparing against the actual helper path.

Suggested change

if not candidate.exists() or candidate.resolve() == here:

if not candidate.exists():

MatoTeziTanka · 2026-04-11T20:06:45Z

Community Review — Non-record: 11L PR315 Backout + Native FA3 RunPod (val_bpb=1.1247)

BPB: 1.1247 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

What I found in the code (head SHA 6b4acf9082ee, file records/track_non_record_16mb/2026-03-22_11L_PR315_Backout_FA3_RunPod/train_gpt.py):

Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.04s, dim=512, layers=9, vocab=1024, code=72744 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline.

Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.04s, dim=512, layers=9, vocab=1024, code=72744 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Add non-record PR315 Backout FA3 RunPod submission

6b4acf9

Copilot AI review requested due to automatic review settings March 22, 2026 03:45

Copilot started reviewing on behalf of greqone March 22, 2026 03:45 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

notapplica mentioned this pull request Mar 22, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

greqone mentioned this pull request May 4, 2026

README wishlist review gentle nudge (H-net, MEGAKERNELS, State-space hybrid) #2156

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 11L PR315 Backout + Native FA3 RunPod (val_bpb=1.1247)#394

Non-record: 11L PR315 Backout + Native FA3 RunPod (val_bpb=1.1247)#394
greqone wants to merge 1 commit intoopenai:mainfrom
greqone:codex/pr315-backout-fa3-nonrecord

greqone commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		train_gpt_submit.py — Submission v2: wider MLP + STE int6 QAT + MTP + seq2048 + NTK RoPE +
		fp16 embed + late-K passthrough + sliding window eval.

-train_gpt_submit.py — Submission v2: wider MLP + STE int6 QAT + MTP + seq2048 + NTK RoPE +
-fp16 embed + late-K passthrough + sliding window eval.
+Training script for GPT models used in parameter-golf submissions.
+This module's behavior is defined by the hyperparameters and options below; refer to
+the code and configuration flags instead of this docstring for an authoritative list
+of enabled features.

	if not candidate.exists() or candidate.resolve() == here:
	if not candidate.exists():

Conversation

greqone commented Mar 22, 2026

Summary

Result

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Community Review — Non-record: 11L PR315 Backout + Native FA3 RunPod (val_bpb=1.1247)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants