SP8192 Byte-PPM O=5 + V6 micro, 3-seed mean 0.92967555 BPB by teslaeco · Pull Request #2076 · openai/parameter-golf

teslaeco · 2026-05-01T02:57:30Z

This submission is based on PR1991 (SP8192 Byte-PPM O=5) with a minimal train-only dataset modification.

Key result:

3-seed mean ppm_mixer val_bpb: 0.92967555
seeds: 42, 314, 999
size: ~15.92MB (within 16MB limit)

Modification:

V6 Privacy-Web-Filtering dataset used as a small train-only sparse micro-injection (8192 tokens)
injected only into training shard
no modification of FineWeb validation
tokenizer unchanged (SP8192)

Repro:

included rebuild_and_run_v6_micro_8xh100.sh
includes logs and manifests
full reproducibility from records folder

Notes:

validation is strictly official FineWeb
no leakage
result improves over PR1991 reported baseline

Logs included for all 3 seeds.

INT8 compressed model (~9MB) for cube-letter assignment task. Fits within 16MB limit for Parameter Golf submission.

Prepare model for Parameter Golf submission

INT8 compressed model (~9MB) for cube-letter task

This README provides details about a non-record submission for the OpenAI Parameter Golf challenge, including key results, training configuration, model architecture, optimization details, and compression methods used.

This ensures the submission stays under the **16MB limit**. --- ## Evaluation Evaluation uses tokenizer-aware byte accounting: - Metric: **bits-per-byte (BPB)** - Validation: full FineWeb validation split - Exact values reported after quantization roundtrip --- ## Included Files - `README.md` – run documentation - `submission.json` – metadata for evaluation - `results.tsv` – structured results - `final_model.int8.ptz` – compressed model artifact - `train_gpt.py` – training script - `train.log` – training logs --- ## Notes - This run serves as a **strong baseline** for further optimization. - Key improvement lever: **entropy reduction (BPB)** rather than longer training. - Future directions: - architecture refinement - tokenizer-aware improvements - compression-aware training --- ## Status - ✅ Valid non-record submission - ❌ Not optimized for record track yet - 🎯 Competitive baseline for further iteration

…-submission-metadata-files Fix corrupted metadata for V5 non-record submission

Add non-record V5 SP1024 Seq4096 1xH100 submission

Tighten metadata for V5 non-record submission

Add runpod_record_attempt.sh to automate multi-GPU, multi-seed SOTA run

Add probe setup script for FineWeb caching and auxiliary V6 dataset prep

Add near-SOTA SP8192 LegalTTT 3-seed reproduction

Clean SP8192 LegalTTT reproduction metadata

Fix V8 dataset paths and RunPod probe script

Add W104 faithful SP8192 LegalTTT bad-seed probe

teslaeco · 2026-05-01T09:47:44Z

Raw logs are included for all three seeds, but the final summary file contains small transcription mismatches for seed 314 and seed 999. The authoritative values are the ppm_mixer val_bpb values in train_seed*.log: seed42 0.92982823, seed314 0.92917762, seed999 0.92987519; raw-log mean 0.9296270133. I am updating the summary to match the raw logs and clarifying that these are the source-of-truth run outputs.

teslaeco · 2026-05-01T16:28:20Z

I want to be completely honest.

I’m not a professional ML engineer. I’m an independent researcher doing this out of passion. I like OpenAI competitions and I take part in different challenges even when I’m still learning — I just go for it.

During this submission, I relied heavily on ChatGPT, and some of the guidance I followed turned out to be misleading. For example, I was convinced that key logs were properly saved, but in reality what got included is not what I expected. That’s on me for trusting it too much without verifying everything deeply.

I’m not going to pretend I’m an expert. I did this because I enjoy it and I wanted to push myself.

What I can say is that I put a huge amount of time into these trainings and experiments. This wasn’t random — it was real work, real effort, and real iteration.

I hope that even if the submission has issues, some part of my contribution is still useful. And I genuinely hope you are able to run or inspect the result in some way.

cocohearts · 2026-05-02T18:15:07Z

Leaderboard audit note (pre-cutoff state): I don't think this is valid as a record row. The byte-PPM score does not provide a normalized distribution over the official next-token alphabet before seeing the realized token; it scores the realized byte stream/mixer path. The logs also indicate byte accounting inconsistent with the official validation byte denominator, so the headline 0.9296 BPB is not acceptable leaderboard evidence.

teslaeco · 2026-05-05T17:36:01Z

@cocohearts
Thanks for the comment and the honest evaluation.

I also want to add something from my side
I’m more of an artist and technic than an ML engineer, and this competition is new to me. I was doing many things for the first time, and I didn’t fully understand everything as well as I should have.

While working on this submission, I relied heavily on chat assistance and asked it to follow the competition rules, but unfortunately not everything went as I expected. At some points I was convinced everything was compliant, but in the end it turned out not to be fully the case.

On top of that, I had real technical issues
especially with downloading the main logs. My internet was working fine, but the logs were not downloading properly. Later I thought they had been included, but it turned out they weren’t, and by that time the pod had already been stopped and removed.

I even reran the training specifically to reproduce and fix it, but eventually I ran out of time and budget.

I’m not going to pretend I did everything perfectly
if something is wrong, it means I didn’t fully understand it yet or didn’t manage to align properly with the tools I was using.

I treat this as a learning experience. Thanks for the feedback
it really means a lot to me.😃

teslaeco and others added 30 commits April 5, 2026 18:18

Add terra-cube-int8 model

eda4be1

INT8 compressed model (~9MB) for cube-letter assignment task. Fits within 16MB limit for Parameter Golf submission.

Move model to submissions folder

eccc653

Prepare model for Parameter Golf submission

Delete submissions/terra-cube-int8/final_model.int8.ptz

e923b57

submissions/terra-cube-int8/final_model.int8.ptz

0ceb45b

INT8 compressed model (~9MB) for cube-letter task

Add README for non-record submission V5 SP1024 Seq4096

8b4bfc1

This README provides details about a non-record submission for the OpenAI Parameter Golf challenge, including key results, training configuration, model architecture, optimization details, and compression methods used.

Fix corrupted metadata for V5 non-record submission

3b0974d

Merge pull request #1 from Terraforming-Planet/codex/repair-corrupted…

ddbbf04

…-submission-metadata-files Fix corrupted metadata for V5 non-record submission

Add non-record V5 SP1024 Seq4096 1xH100 submission

4163871

Merge pull request #2 from Terraforming-Planet/codex/task-title

3f3bd5b

Add non-record V5 SP1024 Seq4096 1xH100 submission

Tighten metadata for V5 non-record submission

83a13b1

Merge pull request #3 from Terraforming-Planet/codex/task-title-0dxf7q

6980345

Tighten metadata for V5 non-record submission

Add RunPod record attempt automation script

a993136

Merge pull request #4 from Terraforming-Planet/codex/task-title-xob60q

898f566

Add runpod_record_attempt.sh to automate multi-GPU, multi-seed SOTA run

Add auxiliary V6 probe environment setup script

56f956c

Merge pull request #5 from Terraforming-Planet/codex/task-title-jge68j

7da1404

Add probe setup script for FineWeb caching and auxiliary V6 dataset prep

Add near-SOTA SP8192 LegalTTT 3-seed reproduction

037437f

Merge pull request #6 from Terraforming-Planet/codex/task-title-4gj0jr

5ea4be1

Add near-SOTA SP8192 LegalTTT 3-seed reproduction

Directly fix train_seed42.log for SP8192 LegalTTT reproduction

4921178

Directly fix train_seed314.log for SP8192 LegalTTT reproduction

6334f52

Directly fix train_seed999.log for SP8192 LegalTTT reproduction

78014dd

Directly fix train_gpt.py for SP8192 LegalTTT reproduction

71aa8a2

Clean SP8192 LegalTTT reproduction metadata

142b3a9

Merge pull request #7 from Terraforming-Planet/codex/task-title-5jezva

3a1430b

Clean SP8192 LegalTTT reproduction metadata

Fix V8 dataset paths and RunPod probe script

7be9463

Merge pull request #8 from Terraforming-Planet/codex/task-title-r8zel6

0655fcc

Fix V8 dataset paths and RunPod probe script

Add W104 faithful SP8192 LegalTTT bad-seed probe

2adae56

Merge pull request #9 from Terraforming-Planet/codex/task-title-bduj3v

e3e1ab6

Add W104 faithful SP8192 LegalTTT bad-seed probe

Add PR1991 V6 micro final 3-seed result

43b1c8b

Update final V6 micro 8xH100 logs and disclosure

557a956

cocohearts mentioned this pull request May 2, 2026

Update leaderboard with May 1 audited rows #2146

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SP8192 Byte-PPM O=5 + V6 micro, 3-seed mean 0.92967555 BPB#2076

SP8192 Byte-PPM O=5 + V6 micro, 3-seed mean 0.92967555 BPB#2076
teslaeco wants to merge 30 commits intoopenai:mainfrom
Terraforming-Planet:final-pr1991-v6-0929675

teslaeco commented May 1, 2026

Uh oh!

teslaeco commented May 1, 2026

Uh oh!

teslaeco commented May 1, 2026

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

teslaeco commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teslaeco commented May 1, 2026

Uh oh!

teslaeco commented May 1, 2026

Uh oh!

teslaeco commented May 1, 2026

Uh oh!

cocohearts commented May 2, 2026

Uh oh!

teslaeco commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

teslaeco commented May 5, 2026 •

edited

Loading