Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
eda4be1
Add terra-cube-int8 model
teslaeco Apr 5, 2026
eccc653
Move model to submissions folder
teslaeco Apr 5, 2026
e923b57
Delete submissions/terra-cube-int8/final_model.int8.ptz
teslaeco Apr 5, 2026
0ceb45b
submissions/terra-cube-int8/final_model.int8.ptz
teslaeco Apr 5, 2026
8b4bfc1
Add README for non-record submission V5 SP1024 Seq4096
teslaeco Apr 17, 2026
3a79fac
V5 SP1024 Seq4096 — 1.2143 BPB (1×H100, 15.8MB)
teslaeco Apr 17, 2026
3b0974d
Fix corrupted metadata for V5 non-record submission
teslaeco Apr 17, 2026
ddbbf04
Merge pull request #1 from Terraforming-Planet/codex/repair-corrupted…
teslaeco Apr 17, 2026
4163871
Add non-record V5 SP1024 Seq4096 1xH100 submission
teslaeco Apr 18, 2026
3f3bd5b
Merge pull request #2 from Terraforming-Planet/codex/task-title
teslaeco Apr 18, 2026
83a13b1
Tighten metadata for V5 non-record submission
teslaeco Apr 18, 2026
6980345
Merge pull request #3 from Terraforming-Planet/codex/task-title-0dxf7q
teslaeco Apr 18, 2026
a993136
Add RunPod record attempt automation script
teslaeco Apr 18, 2026
898f566
Merge pull request #4 from Terraforming-Planet/codex/task-title-xob60q
teslaeco Apr 18, 2026
56f956c
Add auxiliary V6 probe environment setup script
teslaeco Apr 18, 2026
7da1404
Merge pull request #5 from Terraforming-Planet/codex/task-title-jge68j
teslaeco Apr 18, 2026
037437f
Add near-SOTA SP8192 LegalTTT 3-seed reproduction
teslaeco Apr 18, 2026
5ea4be1
Merge pull request #6 from Terraforming-Planet/codex/task-title-4gj0jr
teslaeco Apr 18, 2026
4921178
Directly fix train_seed42.log for SP8192 LegalTTT reproduction
teslaeco Apr 18, 2026
6334f52
Directly fix train_seed314.log for SP8192 LegalTTT reproduction
teslaeco Apr 18, 2026
78014dd
Directly fix train_seed999.log for SP8192 LegalTTT reproduction
teslaeco Apr 18, 2026
71aa8a2
Directly fix train_gpt.py for SP8192 LegalTTT reproduction
teslaeco Apr 18, 2026
142b3a9
Clean SP8192 LegalTTT reproduction metadata
teslaeco Apr 18, 2026
3a1430b
Merge pull request #7 from Terraforming-Planet/codex/task-title-5jezva
teslaeco Apr 18, 2026
7be9463
Fix V8 dataset paths and RunPod probe script
teslaeco Apr 20, 2026
0655fcc
Merge pull request #8 from Terraforming-Planet/codex/task-title-r8zel6
teslaeco Apr 20, 2026
2adae56
Add W104 faithful SP8192 LegalTTT bad-seed probe
teslaeco Apr 20, 2026
e3e1ab6
Merge pull request #9 from Terraforming-Planet/codex/task-title-bduj3v
teslaeco Apr 20, 2026
43b1c8b
Add PR1991 V6 micro final 3-seed result
teslaeco May 1, 2026
557a956
Update final V6 micro 8xH100 logs and disclosure
teslaeco May 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added final_model.int8.ptz
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Parameter-Golf V8 WebSignal BPE Entropy MicroMix

Dataset repo: `8Planetterraforming/Parameter-Golf-V8-WebSignal-BPE-Entropy-MicroMix`

This repository is laid out as a **flat Hugging Face dataset repo**.

## File list

- `train.jsonl`
- `validation.jsonl`
- `test.jsonl`
- `train.txt`
- `validation.txt`
- `test.txt`
- `v8_micro_0p02pct.txt`
- `v8_micro_0p05pct.txt`
- `v8_micro_0p10pct.txt`
- `build_v8_micro_mix.py`
- `run_v8_seed42_probe.sh`
- `probe_plan.md`
- `dataset_design.md`
- `stats.json`
- `dataset_infos.json`
- `source_sanitization.md`
- `upload_to_hf.md`

## Plain-text artifacts

Use the flat text files directly:

- `train.txt`
- `validation.txt`
- `test.txt`
- `v8_micro_*.txt`

## Recommended micro-mix rates

- `0.02%`
- `0.05%`
- `0.10%`

## Probe pass condition

For seed-42 probing, keep the gate unchanged:

- seed42 must beat `1.08041364` before running a 3-seed proof.

## Notes

This dataset documentation does **not** claim this dataset already beats SOTA.
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/usr/bin/env bash
set -euo pipefail

HF_DATASET_REPO="8Planetterraforming/Parameter-Golf-V8-WebSignal-BPE-Entropy-MicroMix"

DATASET_DIR="$({ python - <<'PY'
from huggingface_hub import snapshot_download

repo_id = "8Planetterraforming/Parameter-Golf-V8-WebSignal-BPE-Entropy-MicroMix"
path = snapshot_download(repo_id=repo_id, repo_type="dataset")
print(path)
PY
} | tail -n 1)"

export DATASET_DIR

echo "Using dataset snapshot: $DATASET_DIR"
python "$DATASET_DIR/build_v8_micro_mix.py"

echo
echo "Recommended micro-mix rates (unchanged): 0.02%, 0.05%, 0.10%"
echo "Pass condition (unchanged): seed42 must beat 1.08041364 before 3-seed proof"
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Near-SOTA Reproduction: SP8192 + QK-Gain 5.25 + Legal TTT

This submission is an independent 3-seed reproduction of the current SP8192 + 3-layer recurrence + parallel residuals + QK-Gain 5.25 + Legal TTT stack.

This does **not** claim a new SOTA record because the 3-seed mean does not beat the current 1.0810 record.

## Results (3 seeds)

- seed 42: val_loss 2.7982063, val_bpb 1.08041364
- seed 314: val_loss 2.7941035, val_bpb 1.08168719
- seed 999: val_loss 2.79443824, val_bpb 1.08181413
- mean val_bpb: 1.08130499
- population std val_bpb: 0.00063240

## Hardware

- 8xH100 80GB

## Notes

- All runs were under the 10-minute training target based on logs.
- Run from the official `openai/parameter-golf` code path.

## Included files

- `train_seed42.log`
- `train_seed314.log`
- `train_seed999.log`
- `train_gpt.py`
- `submission.json`
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{
"author": "Sebastian Laskowski",
"github_id": "Terraforming-Planet",
"name": "SP8192 + QK-Gain 5.25 + Legal TTT Reproduction",
"date": "2026-04-18",
"track": "10min_16mb",
"val_bpb": 1.08130499,
"val_bpb_std": 0.00063240,
"seeds": [
42,
314,
999
],
"seed_results": {
"42": {
"val_loss": 2.7982063,
"val_bpb": 1.08041364
},
"314": {
"val_loss": 2.7941035,
"val_bpb": 1.08168719
},
"999": {
"val_loss": 2.79443824,
"val_bpb": 1.08181413
}
},
"hardware": "8xH100 80GB",
"technique_summary": "Independent 3-seed reproduction of the SP8192 + 3-layer recurrence + parallel residuals + QK-Gain 5.25 + legal score-first TTT stack.",
"compliance": {
"train_under_600s": true,
"artifact_under_16mb": true,
"eval_under_600s": true,
"score_first_ttt": true,
"three_seeds": true,
"not_claiming_new_sota": true
}
}

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
nohup: ignoring input
W0418 14:47:38.710000 1618 torch/distributed/run.py:851]
W0418 14:47:38.710000 1618 torch/distributed/run.py:851] *****************************************
W0418 14:47:38.710000 1618 torch/distributed/run.py:851] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0418 14:47:38.710000 1618 torch/distributed/run.py:851] *****************************************
Hyperparameters:
adam_eps: 1e-08
adam_wd: 0.02
beta1: 0.9
beta2: 0.95
compressor: brotli
data_dir: ./data/
datasets_dir: ./data/datasets/fineweb10B_sp8192
distributed: True
ema_decay: 0.9965
embed_bits: 8
embed_clip_sigmas: 20.0
embed_lr: 0.6
embed_wd: 0.085
embedding_dim: 512
enable_looping_at: 0.35
etlb_clip: 3.0
etlb_enabled: False
etlb_lr: 0.05
etlb_steps: 5
eval_seq_len: 2048
eval_stride: 64
gptq_calibration_batches: 64
gptq_reserve_seconds: 12.0
grad_accum_steps: 1
grad_clip_norm: 0.3
head_lr: 0.008
is_main_process: True
iterations: 20000
ln_scale: True
local_rank: 0
logfile: logs/b954d466-a996-4848-a0d4-3f69a237bc70.txt
logit_softcap: 30.0
loop_end: 5
loop_start: 3
matrix_bits: 6
matrix_clip_sigmas: 12.85
matrix_lr: 0.022
max_wallclock_seconds: 600.0
min_lr: 0.0
mlp_mult: 4.0
model_dim: 512
model_path: final_model.pt
muon_backend_steps: 5
muon_beta2: 0.95
muon_momentum: 0.99
muon_momentum_warmup_start: 0.92
muon_momentum_warmup_steps: 1500
muon_row_normalize: True
muon_wd: 0.095
num_heads: 8
num_kv_heads: 4
num_layers: 11
num_loops: 2
parallel_residual_start: 7
qk_gain_init: 5.25
quantized_model_path: final_model.int6.ptz
rank: 0
rope_base: 10000.0
rope_dims: 16
rope_train_seq_len: 2048
run_id: b954d466-a996-4848-a0d4-3f69a237bc70
scalar_lr: 0.02
seed: 314
skip_gates_enabled: True
sliding_window_enabled: True
tie_embeddings: True
tied_embed_init_std: 0.005
tied_embed_lr: 0.03
tokenizer_path: ./data/tokenizers/fineweb_8192_bpe.model
train_batch_tokens: 786432
train_files: ./data/datasets/fineweb10B_sp8192/fineweb_train_*.bin
train_log_every: 500
train_seq_len: 2048
ttt_chunk_tokens: 32768
ttt_enabled: True
ttt_epochs: 3
ttt_lr: 0.005
ttt_momentum: 0.9
val_batch_tokens: 524288
val_files: ./data/datasets/fineweb10B_sp8192/fineweb_val_*.bin
val_loss_every: 4000
vocab_size: 8192
warmdown_frac: 0.72
warmup_steps: 20
world_size: 8
xsa_last_n: 11
train_shards: 80
val_tokens: 40540160
model_params:35944536
gptq:reserving 12s, effective=588000ms
warmup_step: 1/20
warmup_step: 2/20
warmup_step: 3/20
warmup_step: 4/20
warmup_step: 5/20
warmup_step: 6/20
warmup_step: 10/20
warmup_step: 20/20
loop_warmup:enabled encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
loop_warmup_step: 1/20
loop_warmup_step: 2/20
loop_warmup_step: 3/20
loop_warmup_step: 4/20
loop_warmup_step: 5/20
loop_warmup_step: 6/20
loop_warmup_step: 10/20
loop_warmup_step: 20/20
0/20000 val_loss: 9.0096 val_bpb: 3.4879
1/20000 train_loss: 9.0109 train_time: 0.0m tok/s: 8278628
2/20000 train_loss: 12.3533 train_time: 0.0m tok/s: 8131273
3/20000 train_loss: 11.0251 train_time: 0.0m tok/s: 8038845
4/20000 train_loss: 9.4762 train_time: 0.0m tok/s: 7993262
5/20000 train_loss: 8.3404 train_time: 0.0m tok/s: 7969928
500/20000 train_loss: 3.3850 train_time: 0.8m tok/s: 7767134
1000/20000 train_loss: 3.2888 train_time: 1.7m tok/s: 7760918
1500/20000 train_loss: 3.1878 train_time: 2.5m tok/s: 7765583
2000/20000 train_loss: 3.0712 train_time: 3.4m tok/s: 7771584
layer_loop:enabled step:2034 frac:0.350 encoder:[0, 1, 2, 3, 4, 5, 3, 4] decoder:[5, 3, 4, 5, 6, 7, 8, 9, 10]
2500/20000 train_loss: 3.1233 train_time: 4.6m tok/s: 7075217
3000/20000 train_loss: 2.8994 train_time: 5.9m tok/s: 6680374
3500/20000 train_loss: 2.9406 train_time: 7.1m tok/s: 6433847
4000/20000 train_loss: 2.8215 train_time: 8.4m tok/s: 6260888
4000/20000 val_loss: 2.8768 val_bpb: 1.1137
4500/20000 train_loss: 2.8422 train_time: 9.7m tok/s: 6101788
4554/20000 val_loss: 2.8146 val_bpb: 1.0896
stopping_early: wallclock_cap train_time: 588062ms step: 4554/20000
peak memory allocated: 39045 MiB reserved: 39120 MiB
ema:applying EMA weights
pre-quantization post-ema val_loss:2.81149855 val_bpb:1.08841870 eval_time:7016ms
Serialized model: 135431033 bytes
Code size: 16594 bytes
GPTQ:collecting Hessians from calibration data...
GPTQ:collected 67 Hessians in 12.8s
Quantized weights:
gptq (int6): blocks.attn.c_k.weight, blocks.attn.c_q.weight, blocks.attn.c_v.weight, blocks.attn.proj.weight, blocks.mlp.fc.weight, blocks.mlp.proj.weight
gptq (int8): tok_emb.weight
passthrough (float16): blocks.attn.q_gain, blocks.attn_scale, blocks.mlp_scale, blocks.resid_mix, skip_gates, skip_weights
Serialized model quantized+brotli: 15976703 bytes
Total submission size quantized+brotli: 15993297 bytes
quantized val_loss:2.84063375 val_bpb:1.09969785 eval_time:25612ms
quantized_sliding_window val_loss:2.79767937 val_bpb:1.08306887 eval_time:126853ms
ttt:start chunks=1238 ttt_lr=0.005 ttt_epochs=3
quantized_ttt val_loss:2.79411035 val_bpb:1.08168719 eval_time:378266ms
Loading