Skip to content

[Non-record] SP8192 + MuonEq-R + Loop@0.42 + RECUR_AB + QAT-lite + Compact Artifact#1

Draft
ChideraIbe123 wants to merge 6 commits intomainfrom
submission/recurab-042-nonrecord
Draft

[Non-record] SP8192 + MuonEq-R + Loop@0.42 + RECUR_AB + QAT-lite + Compact Artifact#1
ChideraIbe123 wants to merge 6 commits intomainfrom
submission/recurab-042-nonrecord

Conversation

@ChideraIbe123
Copy link
Copy Markdown
Owner

Summary

This PR submits a fully under-cap, under-time, rule-compliant non-record branch from an SP8192 recurrence-focused research cycle.

Final single-seed result:

  • val_bpb = 1.09960971
  • total artifact size: 15,974,435 bytes
  • train time: 599.092s
  • TTT eval time: 544.199s

Main ideas

  • MuonEq-R
  • wallclock-aware depth recurrence activated at ENABLE_LOOPING_AT=0.42
  • learned recurrent alpha/beta blending (RECUR_AB)
  • targeted late QAT-lite on sensitive q/k projections
  • compact artifact engineering, including compressed control tensors / GPTQ scale storage and an LZMA code wrapper

Research context

This branch came out of a broader legal-only search over recurrence-native and compression-aware techniques. The main findings that survived into the final submission were:

  • Loop@0.42 beat earlier recurrence schedules like 0.35 and 0.40
  • RECUR_AB beat both the plain recurrence stack and the earlier RecurAlpha variant
  • broad HQClip improved quality but blew up artifact size too much to submit
  • RECUR_LORA, AWQ-lite, and compressor-only swaps did not survive the quality/size tradeoff

Final metrics

Stage BPB
Raw pre-quant 1.1046
Quantized 1.1336
Final TTT 1.09960971
Artifact item Bytes
Quantized model + Brotli 15,949,492
Code 24,943
Total 15,974,435

Compliance checklist

  • Causal left-to-right dependence
  • Full normalized softmax distribution
  • Score-before-update TTT ordering
  • Single left-to-right pass with no rescoring
  • Train under 600s
  • Eval under 600s
  • Artifact under 16,000,000 bytes

Why non-record

  • single-seed result
  • does not beat the current record stack

Reproduction

SEED=1337 \
MUON_EQR=1 \
EMA_DECAY=0 \
ENABLE_LOOPING_AT=0.42 \
MAX_WALLCLOCK_SECONDS=599.0 \
RECUR_ALPHA_ENABLED=0 \
RECUR_AB_ENABLED=1 \
RECUR_A_INIT=1.0 \
RECUR_B_INIT=0.0 \
QAT_LITE_ENABLED=1 \
QAT_LITE_START_FRAC=0.55 \
QAT_LITE_EVERY=4 \
QAT_LITE_LAMBDA=0.02 \
QAT_LITE_BITS=6 \
QAT_LITE_CLIP_SIGMAS=12.85 \
QAT_LITE_LAYER_START=7 \
QAT_LITE_TARGETS=qk \
QAT_LITE_PENALTY=mse \
QAT_LITE_DEPTH_POWER=0.0 \
COMPRESSOR=brotli \
DATA_PATH=./data/datasets/fineweb10B_sp8192 \
TOKENIZER_PATH=./data/tokenizers/fineweb_8192_bpe.model \
VOCAB_SIZE=8192 \
torchrun --standalone --nproc_per_node=8 \
records/track_non_record_16mb/2026-04-27_SP8192_MuonEqR_Loop042_RecurAB_QATLite/train_gpt.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant