SP8192 Byte-PPM O=5 + V6 micro, 3-seed mean 0.92967555 BPB#2076
SP8192 Byte-PPM O=5 + V6 micro, 3-seed mean 0.92967555 BPB#2076teslaeco wants to merge 30 commits intoopenai:mainfrom
Conversation
INT8 compressed model (~9MB) for cube-letter assignment task. Fits within 16MB limit for Parameter Golf submission.
Prepare model for Parameter Golf submission
INT8 compressed model (~9MB) for cube-letter task
This README provides details about a non-record submission for the OpenAI Parameter Golf challenge, including key results, training configuration, model architecture, optimization details, and compression methods used.
This ensures the submission stays under the **16MB limit**. --- ## Evaluation Evaluation uses tokenizer-aware byte accounting: - Metric: **bits-per-byte (BPB)** - Validation: full FineWeb validation split - Exact values reported after quantization roundtrip --- ## Included Files - `README.md` – run documentation - `submission.json` – metadata for evaluation - `results.tsv` – structured results - `final_model.int8.ptz` – compressed model artifact - `train_gpt.py` – training script - `train.log` – training logs --- ## Notes - This run serves as a **strong baseline** for further optimization. - Key improvement lever: **entropy reduction (BPB)** rather than longer training. - Future directions: - architecture refinement - tokenizer-aware improvements - compression-aware training --- ## Status - ✅ Valid non-record submission - ❌ Not optimized for record track yet - 🎯 Competitive baseline for further iteration
…-submission-metadata-files Fix corrupted metadata for V5 non-record submission
Add non-record V5 SP1024 Seq4096 1xH100 submission
Tighten metadata for V5 non-record submission
Add runpod_record_attempt.sh to automate multi-GPU, multi-seed SOTA run
Add probe setup script for FineWeb caching and auxiliary V6 dataset prep
Add near-SOTA SP8192 LegalTTT 3-seed reproduction
Clean SP8192 LegalTTT reproduction metadata
Fix V8 dataset paths and RunPod probe script
Add W104 faithful SP8192 LegalTTT bad-seed probe
|
Raw logs are included for all three seeds, but the final summary file contains small transcription mismatches for seed 314 and seed 999. The authoritative values are the ppm_mixer val_bpb values in train_seed*.log: seed42 0.92982823, seed314 0.92917762, seed999 0.92987519; raw-log mean 0.9296270133. I am updating the summary to match the raw logs and clarifying that these are the source-of-truth run outputs. |
|
I want to be completely honest. I’m not a professional ML engineer. I’m an independent researcher doing this out of passion. I like OpenAI competitions and I take part in different challenges even when I’m still learning — I just go for it. During this submission, I relied heavily on ChatGPT, and some of the guidance I followed turned out to be misleading. For example, I was convinced that key logs were properly saved, but in reality what got included is not what I expected. That’s on me for trusting it too much without verifying everything deeply. I’m not going to pretend I’m an expert. I did this because I enjoy it and I wanted to push myself. What I can say is that I put a huge amount of time into these trainings and experiments. This wasn’t random — it was real work, real effort, and real iteration. I hope that even if the submission has issues, some part of my contribution is still useful. And I genuinely hope you are able to run or inspect the result in some way. |
|
Leaderboard audit note (pre-cutoff state): I don't think this is valid as a record row. The byte-PPM score does not provide a normalized distribution over the official next-token alphabet before seeing the realized token; it scores the realized byte stream/mixer path. The logs also indicate byte accounting inconsistent with the official validation byte denominator, so the headline 0.9296 BPB is not acceptable leaderboard evidence. |
|
@cocohearts I also want to add something from my side While working on this submission, I relied heavily on chat assistance and asked it to follow the competition rules, but unfortunately not everything went as I expected. At some points I was convinced everything was compliant, but in the end it turned out not to be fully the case. On top of that, I had real technical issues I even reran the training specifically to reproduce and fix it, but eventually I ran out of time and budget. I’m not going to pretend I did everything perfectly I treat this as a learning experience. Thanks for the feedback |
This submission is based on PR1991 (SP8192 Byte-PPM O=5) with a minimal train-only dataset modification.
Key result:
Modification:
Repro:
Notes:
Logs included for all 3 seeds.