[WIP] Compression-aware fixed-step research#33
Closed
JusticeShultz wants to merge 10 commits intoopenai:mainfrom
Closed
[WIP] Compression-aware fixed-step research#33JusticeShultz wants to merge 10 commits intoopenai:mainfrom
JusticeShultz wants to merge 10 commits intoopenai:mainfrom
Conversation
…urrence, factorized embeddings, hybrid eval-time compute & local proxy iteration
…elect the current run, rather than requiring a shell command input
… new roundtrip sweep launchers This update brings the local 3090 research branch up to date with the latest roundtrip-proxy findings. Changes: - updated `docs/research_tracks.md` with the completed recurrent/shared-block and sidecar sweep results - marked the dense compression-aware baseline (`COMPRESSION_REG_WEIGHT=0.005`) as the current local leader at `final_int8_zlib_roundtrip_exact val_bpb 2.06085837` - reprioritized the next pivot toward conservative low-bit / ternary shaping on top of the winning dense setup - fixed Run Monitor stale wrap-up handling so incomplete logs no longer show bogus "over expected quantized validation" ETAs - added dedicated sweep launchers for: - roundtrip sidecar tuning - roundtrip ternary / low-bit tuning Current status: - best local matched roundtrip result remains `2.06085837` - sidecar revisit got very close (`2.06132482`) but did not beat the dense winner - recurrent/shared-block variants were not competitive on the local roundtrip track - ternary sweep is now the active next research pivot
…p dense frontier exploration This update brings the local 3090 research branch up to date with the latest fixed-step roundtrip results and dense near-cap experiments. Highlights: - stabilized local methodology around fixed-step post-roundtrip evaluation instead of noisy wallclock-only ranking - confirmed compression-aware training as the only clearly first-order win among the early experimental branches - added export-aware compression regularization work, including grid-alignment and follow-up scale/outlier checks - showed that most small-model micro-ideas (sidecar, ternary, sparse attention, recurrence, residual-budget tuning) do not currently beat the dense compression-aware control on the trusted local track - ran an iso-byte dense sweep, which showed that simply spending more of the byte budget matters much more than small under-cap regularizer gains - extended the dense frontier near the artifact cap and found a new local leader: - `14 layers / 576 dim / 8 heads / 4 KV heads` - `COMPRESSION_REG_WEIGHT=0.005` - `COMPRESSION_GRID_REG_WEIGHT=0.10` - fixed-step exact roundtrip `val_bpb=1.99806297` - total artifact `15,222,128` bytes Current local takeaways: - dense scaling near the byte cap is the dominant direction right now - depth currently looks better than width in the near-cap regime tested so far - the next likely high-upside branch is export-side work on top of the deeper dense control, not more small-model sidecar or low-bit sweeps Also included: - updated `docs/research_tracks.md` - added/updated local sweep scripts for fixed-step export-aware, iso-byte, and high-cap dense experiments - hardened parts of the local sweep process after finding launcher/harness issues during larger runs
…act of 15,869,071 bytes on my 3090
gHashTag
added a commit
to gHashTag/parameter-golf
that referenced
this pull request
Apr 30, 2026
…penai#33) The previous fix (PR openai#32) extracted the JSON correctly but then piped the raw verdict string ('NOT YET') into $GITHUB_OUTPUT without a key, which the runner rejects: Unable to process file command 'output' successfully. Invalid format 'NOT YET' Fix: write 'verdict=<value>' instead. Also replace the space inside the verdict ('GATE-2 PASS', 'NOT YET') with an underscore so the value is a single token, since GITHUB_OUTPUT doesn't accept multi-word unencoded values without the multiline EOF marker. This output is informational only — the digest step reads from the JSON file directly via jq, so the encoding change has no downstream effect. Refs openai#16. Co-authored-by: Perplexity Computer <computer@perplexity.ai>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is a WIP local research branch for Parameter Golf focused on improving post-roundtrip
val_bpbunder the actual artifact constraint, using1x RTX 3090for local search before moving to the target8xH100environment.The biggest shift in this branch is methodological: local ranking moved away from noisy short wallclock runs and onto a fixed-step exact roundtrip track. That made the local loop much more trustworthy and changed the research conclusions substantially.
The current branch focus is now:
As of March 19, 2026,
train_gpt.pyremains under the repo hard cap at1492lines.What This Branch Adds
Compression-aware / export-aware training knobs
COMPRESSION_REG_WEIGHTCOMPRESSION_GRID_REG_WEIGHTCOMPRESSION_SCALE_REG_WEIGHTCOMPRESSION_RANK1_REG_WEIGHTTERNARY_REG_WEIGHTOUTLIER_REG_WEIGHTArchitecture / export knobs
NUM_UNIQUE_BLOCKSWINDOW_SIZEEMBED_DIMINT8_AXIS_MODEINT8_RESIDUAL_RANKINT8_RESIDUAL_BUDGET_BYTESHybrid eval-time knobs
EVAL_CACHE_MIX_WEIGHTEVAL_BIGRAM_MIX_WEIGHTEVAL_CACHE_SIZELocal evaluation / search controls
VAL_MAX_TOKENSROUNDTRIP_VAL_MAX_TOKENSFINAL_ROUNDTRIP_EVALITERATIONSWhat Has Been Tried
1. Matched local roundtrip baseline
baselinert3090_20260318_181344val_bpb=2.110896176,705,058bytes2. Compression-aware baseline
compressrt3090_20260318_175828COMPRESSION_REG_WEIGHT=0.005val_bpb=2.060858376,839,798bytesThis was the first clear local win over the matched baseline.
3. Fixed-step methodology pivot
The local wallclock track turned out to be too noisy to trust for small deltas, so ranking moved to a fixed-step exact roundtrip track.
Dense fixed-step control:
fixedsteprtsweep_20260318_221632_base_aval_bpb=2.04299145This became the new local control.
4. Export-aware compression regularization
Best export-aware probe:
exportaware_fixedstep_20260318_223456_g010_r000COMPRESSION_REG_WEIGHT=0.005COMPRESSION_GRID_REG_WEIGHT=0.10val_bpb=2.042887776,663,470bytesThis slightly but repeatably improved the fixed-step control.
Follow-up export-aware checks:
0.08,0.12): regressed5. Branches that did not currently win locally
These were tested and are currently parked:
Important nuance: these negative results were gathered before or outside the stronger near-cap dense regime, so they are not being treated as globally dead ideas.
6. Iso-byte dense frontier sweep
This changed the branch direction significantly.
Results:
b10->2.02814871at9,683,932bytesb12->2.05262920at11,334,608bytesb14->2.03768242at13,094,288bytesb155->2.00290272at13,741,308bytesThis showed that simply spending more of the byte budget on a dense compression-aware model mattered much more than most under-cap micro-ideas.
7. High-cap dense width/depth frontier
Recovered / rerun near-cap results:
w608_l12->2.00551677at14,371,393bytesw624_l12->2.01128088at15,024,114bytesd576_l14->1.99806297at15,222,128bytesw640_l12->2.00505534at15,658,993bytesCurrent Best Local Result
Current local leader:
highcapdense_rerun_20260319_d576_l1414 layers / 576 dim / 8 heads / 4 KV headsCOMPRESSION_REG_WEIGHT=0.005COMPRESSION_GRID_REG_WEIGHT=0.10val_bpb=1.9980629715,222,128bytesCurrent Findings
Current / Next Direction
The branch has now shifted from “small-model micro-tuning” to “near-cap dense frontier + export-side improvements.”
The next likely high-upside directions are:
14x576control instead of the older 6.6 MB modelCaveats
1x RTX 3090under Windows.8xH100 / 10 minutechallenge runs.Why This Branch Exists
The goal of this branch is to build a reproducible local search loop that ranks ideas against a closer approximation of the real challenge objective, so the strongest branch can be taken to target compute once grant access is available.