Skip to content

[Record Candidate] SP8192 · GatedAttn + Phased TTT + LQER · 10 min / 16 MB #2065

Open
IanniMuliterno wants to merge 50 commits intoopenai:mainfrom
IanniMuliterno:apr29-phasedttt-runpod
Open

[Record Candidate] SP8192 · GatedAttn + Phased TTT + LQER · 10 min / 16 MB #2065
IanniMuliterno wants to merge 50 commits intoopenai:mainfrom
IanniMuliterno:apr29-phasedttt-runpod

Conversation

@IanniMuliterno
Copy link
Copy Markdown

@IanniMuliterno IanniMuliterno commented May 1, 2026

This PR combines three components from existing accepted PRs with a new integration and QuantGate export path:
gated attn + quantgate (lineage #1769, Qwen (arXiv:2505.06708))
phased TTT+global SGD (lineage PR #1727)
mixed GPTQ + LQER (lineage #1855 )

Score-first compliance is preserved throughout: tokens are always scored before any update that could have seen them, both for the local LoRA adapter updates (chunk level) and the
global SGD updates (phase level).

Architecture

  • Tokenizer: SP8192 sentencepiece, vocab 8192
  • Layers: 11, dim 512, 8 heads / 4 KV heads
  • Depth recurrence: loop over layers 3–5, NUM_LOOPS=2
  • Parallel residual from layer 7
  • GatedAttn on all layers (GATED_ATTN_ENABLED=1, N(0, 0.01) init)
  • QK gain init: 5.25, EMA decay: 0.9965

Eval path

Primary scored metric: quantized_ttt_phased

Eval sequence:

  1. pre-quantization post-ema
  2. quantized
  3. quantized_ttt_phased — primary scored path

Full 3-seed results (seeds 42 / 314 / 999, 8×H100, 10-min wall clock) will be added as train logs once the runs complete.

Reproduction

See records/track_10min_16mb/2026-04-29_SP8192_AttnGate_PhasedTTT_LoRA_LaCT/README.md for full setup, smoke test command, and 8×H100 launch instructions.

SEED=42 bash run.sh 2>&1 | tee logs/seed42.log
SEED=314 bash run.sh 2>&1 | tee logs/seed314.log
SEED=999 bash run.sh 2>&1 | tee logs/seed999.log

References

self-generated calibration variants beyond current AR calibration
PS: `colab/2026-04-06_QuantExport3_RotationAware_GPTQMix/highest_ROI_experiment_Tier_1_2_(rotation_aware,_better_Hessian_approx,_mixed_precision).ipynb` has no cell output because I am rerunning that
Per-category bitwidths + entropy proxy in the GPTQ allocator, Doc-local TTT eval with per-document reset and score-before-update
contains baseline run (a.k.a current registered record adapted to run on colab) and 2 experiments for comparison
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant