Skip to content

WIP Record: SP8192 + CaseOps + Depth Curriculum + FreqGPTQ + PPM adaptive-λ mixture — val_bpb 0.90687688 (1-seed)#1833

Draft
pragnyanramtha wants to merge 17 commits intoopenai:mainfrom
pragnyanramtha:record/attempt3
Draft

WIP Record: SP8192 + CaseOps + Depth Curriculum + FreqGPTQ + PPM adaptive-λ mixture — val_bpb 0.90687688 (1-seed)#1833
pragnyanramtha wants to merge 17 commits intoopenai:mainfrom
pragnyanramtha:record/attempt3

Conversation

@pragnyanramtha
Copy link
Copy Markdown

Summary

Builds on romeerp's #1756 depth curriculum stack. Adds two techniques:

  1. FreqGPTQ — upweights top-100 most frequent calibration tokens by 2×
    during Hessian collection, improving int6 quantization quality on
    high-frequency vocabulary items.

  2. PPM-D adaptive-λ mixture (from OE-GOD Record: SP4096 + byte-level PPM adaptive-λ mixture — val_bpb 1.01925 (3-seed) #1785) — byte-level PPM order-5
    predictor mixed with NN log-probs at eval time using adaptive gate:
    λ=0.05 when PPM confidence >0.9, λ=0.9 otherwise. Zero artifact cost.

Results (1-seed, 8×H100 SXM)

Metric Value
Pre-quant post-EMA val_bpb 1.07238
Post-TTT val_bpb 1.06902
Artifact size ~24.5 MB ⚠️ (over cap, WIP)
Eval time ~658s ⚠️ (over cap, WIP)

Status

Single seed screening run. Two known issues being fixed:

  • Artifact size over 16MB cap (investigating NUM_LOOPS reduction + more
    aggressive int8 passthrough quantization)
  • Eval time over 600s cap (investigating TTT chunk reduction)

Full 3-seed compliant submission pending fixes.

Base

Fork of romeerp #1756 (CaseOps + depth curriculum 1→3→4)`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants