Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB by gowtham0992 · Pull Request #1113 · openai/parameter-golf

gowtham0992 · 2026-03-30T03:10:16Z

Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB

Frozen Random Orthogonal Weights (0 bytes) + LoRA rank-32 Adapters + 30M Effective Params + 5.19 MB Artifact

val_bpb: 1.3705 (seed=42) | 5.19 MB artifact | 8×H100 SXM, 555s training + 105s eval

Results (seed=42, 8×H100 SXM)

Metric	Value
val_bpb (post-quant sliding)	1.3705
val_bpb (pre-quant)	1.3959
val_loss	2.3140
Steps	4,307
ms/step	128.88
Training time	555s
GPTQ time	29s
Eval time	105s
Peak memory	24,407 MiB
Artifact	5,191,021 bytes (5.19 MB)
Model bytes	5,115,117
Code bytes	75,904
Trainable params	3,744,892
Frozen random params	25,952,256 (NOT stored)
Effective total params	29,697,148
Artifact usage	32% of 16 MB limit

Method

Standard 11L Transformer, but every attention Q/K/V/proj and MLP fc/proj weight is a FrozenRandomLinearWithLoRA:

y = x @ W_frozen^T + LoRA(x)
  = x @ W_frozen^T + x @ B^T @ A^T

W_frozen: random orthogonal matrix via QR decomposition, generated from a deterministic seed. Registered as persistent=False buffer — NOT saved in state_dict. At eval time, regenerated from the same seed. Cost: 0 bytes.
LoRA A: (out, rank=32), initialized to zeros
LoRA B: (rank=32, in), initialized to N(0, 1/√in)
alpha/rank = 1.0 (standard LoRA scaling)

Why This Works

Random orthogonal projections provide a rich, well-conditioned feature space (reservoir computing principle). The LoRA adapters learn to select and combine features from this random basis. The orthogonal initialization ensures no information is lost in the projection.

Size Impact

Component	Params	Stored
Frozen random weights	26M	0 bytes (regenerated from seed)
LoRA adapters	3.7M	~5 MB compressed
Embeddings, norms, etc.	~0.5M	included above
Total effective	30M	5.19 MB

Implementation Details

FrozenRandomLinearWithLoRA overrides _save_to_state_dict to exclude frozen weights
_load_from_state_dict regenerates frozen weights from seed on load
Save/load roundtrip verified: 0.0 logit difference
Each block gets unique seeds (layer_idx × 100 + offset) for independent random projections

Architecture

11 layers, d_model=512, 8 heads, 4 KV heads (GQA)
All attention and MLP projections: FrozenRandomLinearWithLoRA (rank 32)
XSA on all 11 layers, Partial RoPE (16/64), LN Scale
LeakyReLU(0.5)² MLP (3x expansion)
BigramHash(2048), SmearGate, VRL
int6 GPTQ (only 2 layers have quantizable weights — the LoRA params are small)
EMA(0.997), SWA

Command

USE_RANDOM_ADAPTERS=1 \
RANDOM_ADAPTER_RANK=32 \
RANDOM_ADAPTER_SEED=12345 \
NGRAM_EVAL=0 \
KNN_LAMBDA=0 \
SEED=42 \
OMP_NUM_THREADS=1 \
python3 -m torch.distributed.run --nproc_per_node=8 train_gpt.py

Compliance

Artifact ≤16,000,000 bytes (5,191,021 — 32% of limit)
Training ≤600s on 8×H100 SXM (555s)
Eval ≤600s (105s)
GPTQ calibration inside training budget (29s, on training data)
No validation data during training
No network calls during evaluation
No external compute
No n-gram cache or kNN (clean sliding window eval only)
Reproducible from train_gpt.py

References

LoRA: arXiv:2106.09685 (Hu et al., 2021)
Reservoir computing / random features
Orthogonal initialization: arXiv:1312.6120 (Saxe et al., 2013)

Included Files

train_gpt.py — full training script
train_seed42.txt — training log
submission.json — metadata
run.sh — reproduction script
requirements.txt — dependencies

MatoTeziTanka · 2026-04-11T20:13:17Z

Community Review — Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB

BPB: 1.3705 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

What I found in the code (head SHA e9c48639b306, file records/track_10min_16mb/2026-03-29_Random_Adapters_LoRA/train_gpt.py):

Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline.

CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=75904 B, SMOKE_TEST_PASS

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline.

Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually.

Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=75904 B, SMOKE_TEST_PASS. Classification via deterministic AST-based classify_prs.py (pattern bank derived from ~65 manually-reviewed PRs earlier in the 2026-04-11 sweep). This review was auto-drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

valerio-oai · 2026-05-03T05:39:41Z

Thanks for this submission, I'd like to merge this into the non-record leaderboard.
Before merge, could you change the files' location? The PR is titled as notable non-record, but the files are under:

records/track_10min_16mb/2026-03-29_Random_Adapters_LoRA/

Given this should be non-record, please move it under records/track_non_record_16mb/....

gowtham0992 · 2026-05-03T06:29:13Z

Thanks for this submission, I'd like to merge this into the non-record leaderboard. Before merge, could you change the files' location? The PR is titled as notable non-record, but the files are under:

records/track_10min_16mb/2026-03-29_Random_Adapters_LoRA/

Given this should be non-record, please move it under records/track_non_record_16mb/....

@valerio-oai

Thanks, moved the Random Adapters submission from records/track_10min_16mb/2026-03-29_Random_Adapters_LoRA/ to records/track_non_record_16mb/2026-03-29_Random_Adapters_LoRA/ as requested.

The latest push is rename-only for those six files, with no content changes.

reyhandl mentioned this pull request Apr 7, 2026

Add LoRA exploration non-record archive #1439

Open

gowtham0992 added 2 commits May 3, 2026 00:19

Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB

e374ce9

Move Random Adapters submission to non-record track

d2f0224

gowtham0992 force-pushed the random-adapters branch from e9c4863 to d2f0224 Compare May 3, 2026 06:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB#1113

Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB#1113
gowtham0992 wants to merge 2 commits intoopenai:mainfrom
gowtham0992:random-adapters

gowtham0992 commented Mar 30, 2026

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

valerio-oai commented May 3, 2026

Uh oh!

gowtham0992 commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gowtham0992 commented Mar 30, 2026

Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB

Results (seed=42, 8×H100 SXM)

Method

Why This Works

Size Impact

Implementation Details

Architecture

Command

Compliance

References

Included Files

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Community Review — Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB

Uh oh!

valerio-oai commented May 3, 2026

Uh oh!

gowtham0992 commented May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants