[Record Candidate] SP8192 · GatedAttn + Phased TTT + LQER · 10 min / 16 MB #2065
Open
IanniMuliterno wants to merge 50 commits intoopenai:mainfrom
Open
[Record Candidate] SP8192 · GatedAttn + Phased TTT + LQER · 10 min / 16 MB #2065IanniMuliterno wants to merge 50 commits intoopenai:mainfrom
IanniMuliterno wants to merge 50 commits intoopenai:mainfrom
Conversation
self-generated calibration variants beyond current AR calibration
PS: `colab/2026-04-06_QuantExport3_RotationAware_GPTQMix/highest_ROI_experiment_Tier_1_2_(rotation_aware,_better_Hessian_approx,_mixed_precision).ipynb` has no cell output because I am rerunning that
Per-category bitwidths + entropy proxy in the GPTQ allocator, Doc-local TTT eval with per-document reset and score-before-update
contains baseline run (a.k.a current registered record adapted to run on colab) and 2 experiments for comparison
add an explicit profile selector. use PROFILE="t4" for a 'cheaper' training and validation path.
This reverts commit 7219aec.
This reverts commit 11521e9.
support installed module layout while still failing explicitly if nothing usable exists
wrapper should proxy any unknown attribute to the real backend module.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR combines three components from existing accepted PRs with a new integration and QuantGate export path:
gated attn + quantgate (lineage #1769, Qwen (arXiv:2505.06708))
phased TTT+global SGD (lineage PR #1727)
mixed GPTQ + LQER (lineage #1855 )
Score-first compliance is preserved throughout: tokens are always scored before any update that could have seen them, both for the local LoRA adapter updates (chunk level) and the
global SGD updates (phase level).
Architecture
Eval path
Primary scored metric: quantized_ttt_phased
Eval sequence:
Full 3-seed results (seeds 42 / 314 / 999, 8×H100, 10-min wall clock) will be added as train logs once the runs complete.
Reproduction
See records/track_10min_16mb/2026-04-29_SP8192_AttnGate_PhasedTTT_LoRA_LaCT/README.md for full setup, smoke test command, and 8×H100 launch instructions.
SEED=42 bash run.sh 2>&1 | tee logs/seed42.log
SEED=314 bash run.sh 2>&1 | tee logs/seed314.log
SEED=999 bash run.sh 2>&1 | tee logs/seed999.log
References