Skip to content

Record: XSA + LoRA TTT (val_bpb=1.1070)#1254

Open
Elarwei001 wants to merge 2 commits intoopenai:mainfrom
Elarwei001:elarwei-xsa-lora-ttt
Open

Record: XSA + LoRA TTT (val_bpb=1.1070)#1254
Elarwei001 wants to merge 2 commits intoopenai:mainfrom
Elarwei001:elarwei-xsa-lora-ttt

Conversation

@Elarwei001
Copy link
Copy Markdown

Summary

Author: Elar Wei (@Elarwei001)

val_bpb: 1.1070

Artifact size: 14.4 MB (compressed with zlib)

Training time: ~9 min on 8×H100


Results

Metric Value
Pre-TTT val_bpb 1.519
Post-TTT val_bpb 1.1070
TTT Improvement -27.1%
Model Size (compressed) 14.4 MB

Approach

  • 11 layers, d_model=416, 8 attention heads, 4 KV heads (GQA)
  • XSA (Exclusive Self Attention) on all layers
  • LoRA TTT (rank=8) on Q, V projections + LM head
  • QAT Int6 quantization (enabled at 15% of training)
  • BPE-8192 tokenizer
  • ~20.5M parameters

Acknowledgments & Attribution

This submission builds upon the excellent work of the Parameter Golf community:

Technique Credit
BPE-8192 tokenizer @sproos
LoRA TTT @LoquiAuris, @MatoTeziTanka (PR #548, #512)
XSA @jfprincz, @unnir (PR #198)
LeakyReLU(0.5)² @abaybektursun (PR #549)
Int6 QAT @signalrush (PR #414)
Training stack @raahilshah, @thwu1 (PR #162, #180)

Files

  • records/track_10min_16mb/2026-04-02_XSA_LoRA_TTT/README.md — Detailed documentation
  • records/track_10min_16mb/2026-04-02_XSA_LoRA_TTT/submission.json — Metadata
  • records/track_10min_16mb/2026-04-02_XSA_LoRA_TTT/train_gpt.py — Training script
  • records/track_10min_16mb/2026-04-02_XSA_LoRA_TTT/train_seed42.log — Training log

Special thanks to the entire Parameter Golf community for sharing techniques openly!

Author: Elar Wei (@Elarwei001)
val_bpb: 1.1070
Model size: 14.4 MB
Hardware: 8×H100 SXM

Techniques:
- XSA (Exclusive Self Attention) on all 11 layers
- LoRA TTT (Test-Time Training) with rank=8
- QAT Int6 quantization
- BPE-8192 tokenizer

Attribution:
- @sproos (BPE-8192 tokenizer)
- @LoquiAuris, @MatoTeziTanka (LoRA TTT)
- @jfprincz, @unnir (XSA)
- @abaybektursun (LeakyReLU)
- @signalrush (Int6 QAT)
- @raahilshah, @thwu1 (Training stack)
HateBunnyPlzzz added a commit to Itssshikhar/parameter-golf that referenced this pull request Apr 2, 2026
Approaches revamped (old eval-only approaches removed):
- 01: Low-Rank Factored MLP (18 layers in 16MB via rank-128 MLP factors)
- 02: Reptile Meta-Learning Warmdown (meta-optimize for TTT adaptability)
- 03: SVD + Quantized Factors (13 layers via spectral compression)
- 04: Multi-Token Prediction + BPB-Weighted Loss (training loss innovation)
- 05: Gram-Newton-Schulz + FP8 Training (30% more steps in 10 min)

Unmerged PR research saved to unmerged_runs/:
- PR openai#1263: SLOT (0.9354 BPB, legality contested)
- PR openai#1246: Trinity Ternary (0.9650 BPB)
- PR openai#1241: MDLM Diffusion (0.9901 BPB)
- PR openai#1252: WARP (1.0713 BPP)
- PR openai#1257: Complement Training (1.0855 BPB)
- PR openai#1274: Parallel Residuals + Depth Recurrence (1.0876 BPB)
- PR openai#1260: MuonEq-R + Depth Recurrence (1.0929 BPB)
- PR openai#1254: XSA + LoRA TTT (1.1070 BPB)

Key finding: without eval tricks, frontier is ~1.09 BPB (PR openai#1260)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
resouer added a commit to resouer/parameter-golf that referenced this pull request Apr 5, 2026
- LoRA rank=4 on Q/K/V/O projections of last 4 layers (blocks 7-10)
- SGD momentum=0.9, lr=0.002 with cosine decay across chunks
- Per-block discriminative LR: block 7 at 0.6x, blocks 8-10 at 1.0x
- Score-first: score chunk under inference_mode before training LoRA
- 2 epochs per chunk, ~57K LoRA params total
- Based on PR openai#1254 LoRA pattern + PR openai#549 score-first loop
resouer added a commit to resouer/parameter-golf that referenced this pull request Apr 7, 2026
Novel mechanism: zero-initialized nn.Embedding(4096, 512) created at
eval time, trained exclusively through the standard score-first TTT loop.
Learns document-local bigram patterns without modifying any artifact weights.

Hash: h = (prev_token * 2039 + curr_token) % 4096
Injection: tok_emb(x) + eval_hash_emb(h), before RMSNorm
Compliance: same score-first pattern as openai#549/openai#1413 TTT precedent.
Precedent for eval-time params: LoRA-TTT (openai#1254, openai#1354).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatoTeziTanka
Copy link
Copy Markdown

MatoTeziTanka commented Apr 11, 2026

Community Review — Record: XSA + LoRA TTT (val_bpb=1.1070)

Compliance: NEEDS AUTHOR ACTION — train_gpt.py fails to import on CT2038 (Python 3.10 / torch 2.10.0+cpu)

What I found: The CPU smoke test on CT2038 (proteus-engine, 128 GB RAM, Triton 3.6.0, flash_attn stub, cutlass_evt_fusion stub) failed at the import step with:

ModuleNotFoundError: No module named 'modal'

A few of the common patterns I've seen for this class of error in the 2026-04-11 sweep:

Recommendation: Could you run python3 -c "import py_compile; py_compile.compile('train_gpt.py')" on your records-folder train_gpt.py under Python 3.10 specifically? The eval image is Python 3.10 per Issue #17 / the README, so any parse error on 3.10 blocks the submission at import time before any of the scored-eval logic runs.

Once the parse/import issue is fixed, I'll re-run the compliance audit through the normal pipeline. No other flags identified yet because the audit halts at the import step.


Reviewed by @MatoTeziTankaThe Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): IMPORT_FAIL — ModuleNotFoundError: No module named 'modal'. Classification via classify_prs.py AST-based classifier; full compliance audit deferred until the import issue is resolved. Auto-drafted from a template and spot-checked before posting.

@Elarwei001
Copy link
Copy Markdown
Author

Thanks for the careful review. I've fixed the import-time issue in the latest push.

The root cause was a top-level import modal, which could fail during the CPU smoke test environment before any training logic ran. I changed it so that modal is imported optionally, and the Modal-specific app/function entrypoints are only defined when modal is available.

I also re-ran a local compile/import smoke check on the updated train_gpt.py, and it now passes import successfully without modal installed.

Could you please re-run the compliance audit when convenient? Thank you again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants