Skip to content

RECORD: 1855 base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)#1918

Closed
aquariouseworkman wants to merge 3 commits intoopenai:mainfrom
aquariouseworkman:second_b
Closed

RECORD: 1855 base + AWQ-lite mixed-precision GPTQ — val_bpb 1.06086 (3-seed mean)#1918
aquariouseworkman wants to merge 3 commits intoopenai:mainfrom
aquariouseworkman:second_b

Conversation

@aquariouseworkman
Copy link
Copy Markdown
Contributor

Applies activation-aware mixed-precision GPTQ (from PR #1908 / romeerp) on top of codemath3000 PR #1855 stack.

Results

Seed val_bpb (post-TTT) artifact bytes steps eval time
42 1.06118 15,978,503 4989 392.8s
314 1.06005 15,976,469 4986 395.8s
1234 1.06135 15,976,673 4977 395.5s
mean 1.06086

3-seed std: 0.00069. Beats codemath3000 PR #1855 (1.06108) by 0.00022 BPB.

Technique

Training is identical to PR #1855. The only change is post-training quantization:

AWQ-lite (activation-aware GPTQ):

  1. Collect per-input-channel activation RMS during GPTQ calibration
  2. Score column groups: saliency = act_rms * mean(abs(weight))
  3. Select top-1 most salient 64-column group per matrix
  4. Quantize that group at int8 inside the same full-tensor GPTQ solve (rest stays int6)

Env vars: AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64

Setup

  1. pip install -r requirements.txt
  2. apt-get install -y lrzip
  3. Install FA3: pip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/
  4. Run prepare_caseops_data.py to build the dataset
  5. AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 torchrun --standalone --nproc_per_node=8 train_gpt.py

Environment

  • 8xH100 80GB SXM (RunPod)
  • PyTorch 2.9.1+cu128
  • FlashAttention 3.0.0
  • Triton 3.5.1

aquariouseworkman and others added 3 commits April 27, 2026 02:53
…symmetric + Phased TTT

val_bpb = 1.06128 | ~15.95 MB | 8xH100 SXM

Key Change: SmearGate BOS Document Boundary Fix
Builds on PR openai#1797 stack (PR openai#1787 base + SmearGate + LQER Asymmetric) but fixes the SmearGate cross-document leakage bug identified by @cocohearts in PR openai#1797 audit.

The bug: SmearGate 1-token causal lookback does not mask BOS positions, so the final token of document N smears into BOS of document N+1.

Credits
@nprime06 -- PR openai#1787 base stack
@romeerp -- CaseOps transform (PR openai#1729)
@dexhunter -- SmearGate + LQER (PR openai#1797)
@cocohearts -- Identifying SmearGate BOS bug
@abaybektursun -- Score-first TTT (PR openai#549)
@clarkkev -- GPTQ SDClip + SP8192 (PR openai#1394)
…d mean)

Applies activation-aware mixed-precision GPTQ (from PR openai#1908 / romeerp) on top of codemath3000 PR openai#1855 stack.

## Results

| Seed | val_bpb (post-TTT) | artifact bytes | steps | eval time |
|------|--------------------|----------------|-------|-----------|
| 42   | 1.06118            | 15,978,503     | 4989  | 392.8s    |
| 314  | 1.06005            | 15,976,469     | 4986  | 395.8s    |
| 1234 | 1.06135            | 15,976,673     | 4977  | 395.5s    |
| **mean** | **1.06086**    | —              | —     | —         |

3-seed std: 0.00069. Beats codemath3000 PR openai#1855 (1.06108) by 0.00022 BPB.

## Technique

Training is identical to PR openai#1855. The only change is post-training quantization:

**AWQ-lite (activation-aware GPTQ):**
1. Collect per-input-channel activation RMS during GPTQ calibration
2. Score column groups: `saliency = act_rms * mean(abs(weight))`
3. Select top-1 most salient 64-column group per matrix
4. Quantize that group at int8 inside the same full-tensor GPTQ solve (rest stays int6)

Env vars: `AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64`

## Setup
1. `pip install -r requirements.txt`
2. `apt-get install -y lrzip`
3. Install FA3: `pip install --no-deps flash_attn_3 --find-links https://windreamer.github.io/flash-attention3-wheels/cu128_torch291/`
4. Run `prepare_caseops_data.py` to build the dataset
5. `AWQ_LITE_ENABLED=1 AWQ_LITE_BITS=8 AWQ_LITE_GROUP_TOP_K=1 AWQ_LITE_GROUP_SIZE=64 torchrun --standalone --nproc_per_node=8 train_gpt.py`

## Environment
- 8xH100 80GB SXM (RunPod)
- PyTorch 2.9.1+cu128
- FlashAttention 3.0.0
- Triton 3.5.1
@h1beee
Copy link
Copy Markdown

h1beee commented Apr 29, 2026

Beats codemath3000 PR (1.06108) by 0.00022 BPB

You need to outperform by 0.005 BPB: https://github.com/openai/parameter-golf#submission-process

@aquariouseworkman
Copy link
Copy Markdown
Contributor Author

The code is byte-for-byte identical as #1908. I was just able to validate due to resources only, which Romeerp could not obtain. Due to his only limitation being the ability to acquire GPU resources and not ability to develop working code, the record obtained from this merge should go to Romeerp

leon2k2k2k added a commit to leon2k2k2k/parameter-golf that referenced this pull request Apr 29, 2026
- spec 060N: compound AWQ-lite (PR openai#1908) + 4 TTT phases + 3000 prefix
  + 2 global-SGD epochs, eval-only on 060A's final_model.pt. Single-shot
  compound to use openai#1918's ~205s eval-time slack; safe fallback drops
  GLOBAL_TTT_EPOCHS if wallclock blows.
- new idea 1925-matrix-lr-ttt-prefix-tune (PR openai#1925, hyperparam-only
  on openai#1855: MATRIX_LR=0.028 + PHASED_TTT_PREFIX_DOCS=3500 → 1.06109).
- new idea 1915-per-doc-lora-ttt (PR openai#1915, per-doc-only LoRA TTT
  discipline; parked as fallback if global-SGD class is ruled out).
- frontier scan: 21 new PRs (openai#1906-openai#1931). Headline: PRs openai#1908+openai#1918
  independently confirm AWQ-lite mixed-bit GPTQ pattern at ~1.0608 on
  openai#1855 base; openai#1925 hyperparam-only at 1.06109; openai#1923 Asymmetric Logit
  Rescale = empirical negative; openai#1929 banned SLOT+prequant-TTT.
- frontier-state.json: 21 PRs added; total 200.
- diary/2026-04-29-frontier-scan.md: full scan report.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aquariouseworkman
Copy link
Copy Markdown
Contributor Author

Closing, superseded by #1946, which is the clean cherry-picked AWQ-lite PR off openai:main. This branch (#1918 ) accidentally included the SmearGate BOS-fix commit, which already lives in #1851.

@aquariouseworkman aquariouseworkman deleted the second_b branch April 29, 2026 19:57
@akhoyannh-a11y
Copy link
Copy Markdown

akhoyannh-a11y commented Apr 29, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants