-
Notifications
You must be signed in to change notification settings - Fork 0
Add starter kit for low-budget RunPod workflow #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,48 @@ | ||
| # Parameter Golf Starter Kit | ||
|
|
||
| This folder is a low-budget workflow to get from first run to a valid non-record PR. | ||
|
|
||
| ## 1) Fork + set your remote | ||
|
|
||
| From your local repo root: | ||
|
|
||
| ```bash | ||
| git remote rename origin upstream | ||
| git remote add origin https://github.com/YOUR_GITHUB_USERNAME/parameter-golf.git | ||
| git fetch upstream | ||
| git checkout -b exp/first-runs upstream/main | ||
| git push -u origin exp/first-runs | ||
| ``` | ||
|
|
||
| ## 2) On RunPod: first smoke run | ||
|
|
||
| Use scripts in `starter_kit/scripts`: | ||
|
|
||
| 1. `01_runpod_bootstrap.sh` | ||
| 2. `02_smoke_run.sh` | ||
|
|
||
| ## 3) Promote to serious run | ||
|
|
||
| Run `03_full_run.sh` once smoke logs look healthy. | ||
|
|
||
| ## 4) Prepare a PR-ready records folder | ||
|
|
||
| Run: | ||
|
|
||
| ```bash | ||
| python starter_kit/scripts/prepare_submission.py \ | ||
| --track non-record \ | ||
| --run-name my_first_non_record \ | ||
| --author-name "Your Name" \ | ||
| --github-id "your_github" \ | ||
| --val-bpb 1.1999 | ||
| ``` | ||
|
|
||
| Then copy your real train log into the generated folder and edit README details. | ||
|
|
||
| ## 5) Submission checklist | ||
|
|
||
| - Folder only adds one new path under `records/track_non_record_16mb/` or `records/track_10min_16mb/`. | ||
| - Includes `README.md`, `submission.json`, `train_gpt.py`, and train log. | ||
| - Repro steps are explicit and complete. | ||
| - No validation-data leakage or rule violations. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| # Experiment Log Template | ||
|
|
||
| ## Run Metadata | ||
|
|
||
| - run_id: | ||
| - date: | ||
| - gpu: | ||
| - cost_estimate_usd: | ||
| - dataset_variant: | ||
| - train_shards: | ||
| - max_wallclock_seconds: | ||
|
|
||
| ## Config Delta | ||
|
|
||
| - base_commit: | ||
| - branch: | ||
| - changed_hparams: | ||
| - changed_code_paths: | ||
|
|
||
| ## Outcomes | ||
|
|
||
| - val_loss: | ||
| - val_bpb: | ||
| - final_int8_zlib_roundtrip_bytes: | ||
| - step_count: | ||
| - runtime_seconds: | ||
|
|
||
| ## Decision | ||
|
|
||
| - keep / drop: | ||
| - reason: | ||
| - next test: |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| #!/usr/bin/env bash | ||
| set -euo pipefail | ||
|
|
||
| # Usage: | ||
| # bash starter_kit/scripts/01_runpod_bootstrap.sh https://github.com/YOUR_GITHUB_USERNAME/parameter-golf.git | ||
|
|
||
| FORK_URL="${1:-}" | ||
| if [[ -z "$FORK_URL" ]]; then | ||
| echo "Provide your fork URL as first arg." | ||
| exit 1 | ||
| fi | ||
|
|
||
| cd /workspace | ||
| if [[ ! -d parameter-golf ]]; then | ||
| git clone "$FORK_URL" parameter-golf | ||
| fi | ||
|
|
||
| cd parameter-golf | ||
| git remote -v | ||
|
|
||
| echo "Downloading small dataset slice for low-cost iteration..." | ||
| python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 1 | ||
|
|
||
| echo "Bootstrap complete. Run: bash starter_kit/scripts/02_smoke_run.sh" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| #!/usr/bin/env bash | ||
| set -euo pipefail | ||
|
|
||
| # Quick low-cost run (~4 minutes max) | ||
| cd /workspace/parameter-golf | ||
|
|
||
| RUN_ID="${RUN_ID:-smoke_sp1024_$(date +%Y%m%d_%H%M%S)}" | ||
| export RUN_ID | ||
| export DATA_PATH=./data/datasets/fineweb10B_sp1024/ | ||
| export TOKENIZER_PATH=./data/tokenizers/fineweb_1024_bpe.model | ||
| export VOCAB_SIZE=1024 | ||
| export MAX_WALLCLOCK_SECONDS=240 | ||
| export VAL_LOSS_EVERY=0 | ||
|
|
||
| mkdir -p logs | ||
|
|
||
| torchrun --standalone --nproc_per_node=1 train_gpt.py | tee "logs/${RUN_ID}.log" | ||
|
|
||
| echo "Smoke run done: logs/${RUN_ID}.log" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| #!/usr/bin/env bash | ||
| set -euo pipefail | ||
|
|
||
| # Full baseline-style run (~10 minutes) | ||
| cd /workspace/parameter-golf | ||
|
|
||
| RUN_ID="${RUN_ID:-full_sp1024_$(date +%Y%m%d_%H%M%S)}" | ||
| export RUN_ID | ||
| export DATA_PATH=./data/datasets/fineweb10B_sp1024/ | ||
| export TOKENIZER_PATH=./data/tokenizers/fineweb_1024_bpe.model | ||
| export VOCAB_SIZE=1024 | ||
| export MAX_WALLCLOCK_SECONDS=600 | ||
| export VAL_LOSS_EVERY=200 | ||
|
|
||
| mkdir -p logs | ||
|
|
||
| torchrun --standalone --nproc_per_node=1 train_gpt.py | tee "logs/${RUN_ID}.log" | ||
|
|
||
| echo "Full run done: logs/${RUN_ID}.log" |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,67 @@ | ||||||||||||||||||||||||||||||||
| #!/usr/bin/env python3 | ||||||||||||||||||||||||||||||||
| import argparse | ||||||||||||||||||||||||||||||||
| import datetime as dt | ||||||||||||||||||||||||||||||||
| import json | ||||||||||||||||||||||||||||||||
| from pathlib import Path | ||||||||||||||||||||||||||||||||
| import shutil | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| def main() -> None: | ||||||||||||||||||||||||||||||||
| parser = argparse.ArgumentParser(description="Create a PR-ready records folder.") | ||||||||||||||||||||||||||||||||
| parser.add_argument("--track", choices=["record", "non-record"], required=True) | ||||||||||||||||||||||||||||||||
| parser.add_argument("--run-name", required=True) | ||||||||||||||||||||||||||||||||
|
Comment on lines
+10
to
+12
|
||||||||||||||||||||||||||||||||
| parser.add_argument("--author-name", required=True) | ||||||||||||||||||||||||||||||||
| parser.add_argument("--github-id", required=True) | ||||||||||||||||||||||||||||||||
| parser.add_argument("--val-bpb", type=float, required=True) | ||||||||||||||||||||||||||||||||
| parser.add_argument("--source-train-script", default="train_gpt.py") | ||||||||||||||||||||||||||||||||
| args = parser.parse_args() | ||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||
| repo_root = Path(__file__).resolve().parents[2] | ||||||||||||||||||||||||||||||||
| date = dt.datetime.now().strftime("%Y-%m-%d") | ||||||||||||||||||||||||||||||||
| slug = f"{date}_{args.run_name}" | ||||||||||||||||||||||||||||||||
|
Comment on lines
+19
to
+21
|
||||||||||||||||||||||||||||||||
| repo_root = Path(__file__).resolve().parents[2] | |
| date = dt.datetime.now().strftime("%Y-%m-%d") | |
| slug = f"{date}_{args.run_name}" | |
| # Sanitize run_name to ensure it is safe to use as a single path component | |
| safe_run_name = args.run_name.replace("/", "_").replace("\\", "_") | |
| repo_root = Path(__file__).resolve().parents[2] | |
| date = dt.datetime.now().strftime("%Y-%m-%d") | |
| slug = f"{date}_{safe_run_name}" |
Copilot
AI
Apr 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generated submission.json schema (author_name, run_name, notes, etc.) differs from the established format in records/**/submission.json (commonly author, name, blurb, date, plus optional size / seed fields). Please update the generated keys so the output folder matches existing submission conventions.
| "author_name": args.author_name, | |
| "github_id": args.github_id, | |
| "run_name": args.run_name, | |
| "track": args.track, | |
| "val_bpb": round(args.val_bpb, 4), | |
| "date": date, | |
| "notes": "Fill out details and attach train logs." | |
| "author": args.author_name, | |
| "name": args.run_name, | |
| "blurb": "Fill out details and attach train logs.", | |
| "date": date, | |
| "size": "16mb", | |
| "github_id": args.github_id, | |
| "track": args.track, | |
| "val_bpb": round(args.val_bpb, 4), |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| # {{RUN_NAME}} | ||
|
|
||
| - Date: {{DATE}} | ||
| - Track: {{TRACK}} | ||
| - Author: {{AUTHOR_NAME}} ({{GITHUB_ID}}) | ||
| - Reported val_bpb: {{VAL_BPB}} | ||
|
|
||
| ## Summary | ||
|
|
||
| Short summary of the idea and why it may help. | ||
|
|
||
| ## What Changed | ||
|
|
||
| - List architecture changes. | ||
| - List optimization and schedule changes. | ||
| - List quantization or eval changes. | ||
|
|
||
| ## Repro Command | ||
|
|
||
| ```bash | ||
| RUN_ID={{RUN_NAME}} \ | ||
| DATA_PATH=./data/datasets/fineweb10B_sp1024/ \ | ||
| TOKENIZER_PATH=./data/tokenizers/fineweb_1024_bpe.model \ | ||
| VOCAB_SIZE=1024 \ | ||
| torchrun --standalone --nproc_per_node=1 train_gpt.py | ||
| ``` | ||
|
|
||
| ## Results | ||
|
|
||
| - val_bpb: | ||
| - val_loss: | ||
| - compressed_bytes: | ||
| - wallclock_seconds: | ||
|
|
||
| ## Notes | ||
|
|
||
| Any caveats, negative findings, or follow-up experiments. |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,9 @@ | ||||||||||||||||||||||||||
| { | ||||||||||||||||||||||||||
| "author_name": "Your Name", | ||||||||||||||||||||||||||
| "github_id": "your_github", | ||||||||||||||||||||||||||
| "run_name": "your_run_name", | ||||||||||||||||||||||||||
| "track": "non-record", | ||||||||||||||||||||||||||
| "val_bpb": 1.2000, | ||||||||||||||||||||||||||
| "date": "YYYY-MM-DD", | ||||||||||||||||||||||||||
| "notes": "Fill with concise methodology and constraints." | ||||||||||||||||||||||||||
|
Comment on lines
+2
to
+8
|
||||||||||||||||||||||||||
| "author_name": "Your Name", | |
| "github_id": "your_github", | |
| "run_name": "your_run_name", | |
| "track": "non-record", | |
| "val_bpb": 1.2000, | |
| "date": "YYYY-MM-DD", | |
| "notes": "Fill with concise methodology and constraints." | |
| "author": "Your Name", | |
| "name": "your_run_name", | |
| "blurb": "Fill with concise methodology and constraints.", | |
| "track": "non_record_16mb", | |
| "date": "YYYY-MM-DD" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example uses
--track non-record, but current repo submissions typically encode track names in metadata as10min_16mbornon-record-16mb/non_record_16mb. Once the generator’s--trackvalues are aligned, please update this example accordingly to avoid users copying a nonstandard track value into their submission metadata.