Record: base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)#1946
Open
aquariouseworkman wants to merge 2 commits intoopenai:mainfrom
Open
Record: base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)#1946aquariouseworkman wants to merge 2 commits intoopenai:mainfrom
aquariouseworkman wants to merge 2 commits intoopenai:mainfrom
Conversation
…d mean) Applies activation-aware mixed-precision GPTQ (from PR openai#1908 / romeerp) on top of codemath3000 PR openai#1855 stack. ## Results | Seed | val_bpb (post-TTT) | artifact bytes | steps | eval time | |------|--------------------|----------------|-------|-----------| | 42 | 1.06118 | 15,978,503 | 4989 | 392.8s | | 314 | 1.06005 | 15,976,469 | 4986 | 395.8s | | 1234 | 1.06135 | 15,976,673 | 4977 | 395.5s | | **mean** | **1.06086** | — | — | — | 3-seed std: 0.00069. Beats codemath3000 PR openai#1855 (1.06108) by 0.00022 BPB. ## Technique Training is identical to PR openai#1855. The only change is post-training quantization: **AWQ-lite (activation-aware GPTQ):** 1. Collect per-input-channel activation RMS during GPTQ calibration 2. Score column groups: `saliency = act_rms * mean(abs(weight))` 3. Select top-1 most salient 64-column group per matrix 4. Quantize that group at int8 inside the same full-tensor GPTQ solve (rest stays int6) Env vars: `AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64` ## Setup 1. `pip install -r requirements.txt` 2. `apt-get install -y lrzip` 3. Install FA3: `pip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/` 4. Run `prepare_caseops_data.py` to build the dataset 5. `AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 torchrun --standalone --nproc_per_node=8 train_gpt.py` ## Environment - 8xH100 80GB SXM (RunPod) - PyTorch 2.9.1+cu128 - FlashAttention 3.0.0 - Triton 3.5.1
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Splitting AWQ-lite work out of #1851 per @cocohearts' review. This PR contains only the AWQ-lite submission directory; the SmearGate BOS-fix record stays in #1851.
Applies activation-aware mixed-precision GPTQ on top of the PR #1855 stack. 3-seed mean val_bpb 1.06086 (seeds 42 / 314 / 1234). See README in the submission folder for technique details and reproduction steps.
Applies activation-aware mixed-precision GPTQ (from PR #1908 / romeerp) on top of codemath3000 PR #1855 stack.
Results
| Seed | val_bpb (post-TTT) | artifact bytes | steps | eval time | |------|--------------------|----------------|-------|-----------|
| 42 | 1.06118 | 15,978,503 | 4989 | 392.8s |
| 314 | 1.06005 | 15,976,469 | 4986 | 395.8s |
| 1234 | 1.06135 | 15,976,673 | 4977 | 395.5s |
| mean | 1.06086 | — | — | — |
3-seed std: 0.00069. Beats codemath3000 PR #1855 (1.06108) by 0.00022 BPB.
Technique
Training is identical to PR #1855. The only change is post-training quantization:
AWQ-lite (activation-aware GPTQ):
saliency = act_rms * mean(abs(weight))Env vars:
AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64Setup
pip install -r requirements.txtapt-get install -y lrzippip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/prepare_caseops_data.pyto build the datasetAWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 torchrun --standalone --nproc_per_node=8 train_gpt.pyEnvironment