Record: base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean) by aquariouseworkman · Pull Request #1946 · openai/parameter-golf

aquariouseworkman · 2026-04-29T19:51:35Z

Splitting AWQ-lite work out of #1851 per @cocohearts' review. This PR contains only the AWQ-lite submission directory; the SmearGate BOS-fix record stays in #1851.
Applies activation-aware mixed-precision GPTQ on top of the PR #1855 stack. 3-seed mean val_bpb 1.06086 (seeds 42 / 314 / 1234). See README in the submission folder for technique details and reproduction steps.

Applies activation-aware mixed-precision GPTQ (from PR #1908 / romeerp) on top of codemath3000 PR #1855 stack.

Results

| Seed | val_bpb (post-TTT) | artifact bytes | steps | eval time | |------|--------------------|----------------|-------|-----------|
| 42 | 1.06118 | 15,978,503 | 4989 | 392.8s |
| 314 | 1.06005 | 15,976,469 | 4986 | 395.8s |
| 1234 | 1.06135 | 15,976,673 | 4977 | 395.5s |
| mean | 1.06086 | — | — | — |

3-seed std: 0.00069. Beats codemath3000 PR #1855 (1.06108) by 0.00022 BPB.

Technique

Training is identical to PR #1855. The only change is post-training quantization:

AWQ-lite (activation-aware GPTQ):

Collect per-input-channel activation RMS during GPTQ calibration
Score column groups: saliency = act_rms * mean(abs(weight))
Select top-1 most salient 64-column group per matrix
Quantize that group at int8 inside the same full-tensor GPTQ solve (rest stays int6)

Env vars: AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64

Setup

pip install -r requirements.txt
apt-get install -y lrzip
Install FA3: pip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/
Run prepare_caseops_data.py to build the dataset
AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 torchrun --standalone --nproc_per_node=8 train_gpt.py

Environment

8xH100 80GB SXM (RunPod)
PyTorch 2.9.1+cu128
FlashAttention 3.0.0
Triton 3.5.1

…d mean) Applies activation-aware mixed-precision GPTQ (from PR openai#1908 / romeerp) on top of codemath3000 PR openai#1855 stack. ## Results | Seed | val_bpb (post-TTT) | artifact bytes | steps | eval time | |------|--------------------|----------------|-------|-----------| | 42 | 1.06118 | 15,978,503 | 4989 | 392.8s | | 314 | 1.06005 | 15,976,469 | 4986 | 395.8s | | 1234 | 1.06135 | 15,976,673 | 4977 | 395.5s | | **mean** | **1.06086** | — | — | — | 3-seed std: 0.00069. Beats codemath3000 PR openai#1855 (1.06108) by 0.00022 BPB. ## Technique Training is identical to PR openai#1855. The only change is post-training quantization: **AWQ-lite (activation-aware GPTQ):** 1. Collect per-input-channel activation RMS during GPTQ calibration 2. Score column groups: `saliency = act_rms * mean(abs(weight))` 3. Select top-1 most salient 64-column group per matrix 4. Quantize that group at int8 inside the same full-tensor GPTQ solve (rest stays int6) Env vars: `AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64` ## Setup 1. `pip install -r requirements.txt` 2. `apt-get install -y lrzip` 3. Install FA3: `pip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/` 4. Run `prepare_caseops_data.py` to build the dataset 5. `AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 torchrun --standalone --nproc_per_node=8 train_gpt.py` ## Environment - 8xH100 80GB SXM (RunPod) - PyTorch 2.9.1+cu128 - FlashAttention 3.0.0 - Triton 3.5.1

aquariouseworkman · 2026-04-29T19:58:31Z

The code is byte-for-byte identical as #1908 . I was just able to validate due to resources only, which Romeerp could not obtain. Due to his only limitation being the ability to acquire GPU resources and not ability to develop working code, the record obtained from this merge should go to @romeerp

aquariouseworkman and others added 2 commits April 29, 2026 19:46

Fix directory date typo: 2020 -> 2026

e0921a2

aquariouseworkman mentioned this pull request Apr 29, 2026

RECORD: 1855 base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean) #1918

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)#1946

Record: base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)#1946
aquariouseworkman wants to merge 2 commits intoopenai:mainfrom
aquariouseworkman:awq-lite-submission

aquariouseworkman commented Apr 29, 2026

Uh oh!

aquariouseworkman commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aquariouseworkman commented Apr 29, 2026

Results

Technique

Setup

Environment

Uh oh!

aquariouseworkman commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aquariouseworkman commented Apr 29, 2026 •

edited

Loading