Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB#1113
Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB#1113gowtham0992 wants to merge 2 commits intoopenai:mainfrom
Conversation
Community Review — Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPBBPB: 1.3705 | Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache What I found in the code (head SHA Static code review found no TTT adaptation function, no SLOT optimization loop, no n-gram-cache class, and no pre-quant val-token fine-tune. The eval path uses the standard sliding-window stride-64 pattern. The submission is a pure-neural architecture iteration on the standard SP1024/SP4096/SP8192 baseline. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=75904 B, SMOKE_TEST_PASS Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the classification pass — this looks like a clean pure-neural iteration on the standard baseline. Auto-classification caveat: this review was drafted by the AST-based classifier. If there's a non-standard eval mechanism (logit postprocessing, hedge mixing, etc.) that I missed because it's factored into a helper file or a non-standard function name, please flag it and I'll re-run the audit manually. Reviewed by @MatoTeziTanka — The Agora. CPU smoke test (CT2038 proteus-engine, 2026-04-11): import OK in 0.03s, dim=512, layers=11, vocab=1024, code=75904 B, SMOKE_TEST_PASS. Classification via deterministic AST-based |
|
Thanks for this submission, I'd like to merge this into the non-record leaderboard.
Given this should be non-record, please move it under |
e9c4863 to
d2f0224
Compare
Thanks, moved the Random Adapters submission from records/track_10min_16mb/2026-03-29_Random_Adapters_LoRA/ to records/track_non_record_16mb/2026-03-29_Random_Adapters_LoRA/ as requested. The latest push is rename-only for those six files, with no content changes. |
Notable Non-Record: Learning Adapters on Random Linear Maps — 1.3705 BPB
Frozen Random Orthogonal Weights (0 bytes) + LoRA rank-32 Adapters + 30M Effective Params + 5.19 MB Artifact
val_bpb: 1.3705 (seed=42) | 5.19 MB artifact | 8×H100 SXM, 555s training + 105s eval
Results (seed=42, 8×H100 SXM)
Method
Standard 11L Transformer, but every attention Q/K/V/proj and MLP fc/proj weight is a
FrozenRandomLinearWithLoRA:persistent=Falsebuffer — NOT saved in state_dict. At eval time, regenerated from the same seed. Cost: 0 bytes.Why This Works
Random orthogonal projections provide a rich, well-conditioned feature space (reservoir computing principle). The LoRA adapters learn to select and combine features from this random basis. The orthogonal initialization ensures no information is lost in the projection.
Size Impact
Implementation Details
FrozenRandomLinearWithLoRAoverrides_save_to_state_dictto exclude frozen weights_load_from_state_dictregenerates frozen weights from seed on loadArchitecture
Command
Compliance
train_gpt.pyReferences
Included Files
train_gpt.py— full training scripttrain_seed42.txt— training logsubmission.json— metadatarun.sh— reproduction scriptrequirements.txt— dependencies