Record candidate: PR #1797 + AWQ-lite top3 + LQER 60k on b180-tlr56 — val_bpb 1.06043 (seed=0)#2157
Draft
vimeto wants to merge 2 commits intoopenai:mainfrom
Draft
Record candidate: PR #1797 + AWQ-lite top3 + LQER 60k on b180-tlr56 — val_bpb 1.06043 (seed=0)#2157vimeto wants to merge 2 commits intoopenai:mainfrom
vimeto wants to merge 2 commits intoopenai:mainfrom
Conversation
5 tasks
Author
|
3-seed update is in ( Honest result: this PR is worse than the current merged SOTA. Leaving this PR as draft for documentation of the AWQ-lite + LQER + drop-M LoRA stacking experiment and as the missing H100 multi-seed continuation of PR #1935. Not a record claim. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds on the b180-tlr56 lineage (PR #1935) by stacking PR #1908's AWQ-lite mixed-precision GPTQ on top, plus a small LQER budget bump that uses cap margin freed by AWQ-lite's INT8 promotions.
Single seed (SEED=0): val_bpb 1.06043, val_loss 2.32062 nats. Beats PR #1855's 3-seed mean (1.06108 BPB, 2.32203 nats) by -0.00065 BPB / -0.00141 nats. Eval 599.3s, train 596.2s, both inside the 600s lane caps. Per-group lrzip artifact 15,947,372 bytes; total submission 15,982,182 (cap margin 17,818).
i'm running additional seeds at this configuration on RunPod 8xH100 right now and will append SEED=314 and SEED=1234 numbers as soon as they finish. some of those runs are exploring slightly different hparam neighborhoods (LQER budget, AWQ top_k) to map the local landscape — the headline single-seed value is from the configuration documented in the README.
This PR consolidates the test-plan items left open in PR #1935 (which promised SEED=0 / SEED=1234 multi-seed reference logs but couldn't complete them due to a pod crash on the original session). PR #1935 is being closed.
See
records/track_10min_16mb/2026-05-05_AWQTop3_LQER60k_KLoRA_b180_tlr56/README.mdfor full recipe, ablation context, compliance notes, and lineage credits.Test plan