Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
1ce4645
quantization-denoising
nestamidavaine Mar 25, 2026
b38248c
Merge branch 'main' of https://github.com/openai/parameter-golf
nestamidavaine Mar 25, 2026
09ea3cf
Attempt one, 4 passes and succesfully achieve contractive layers
Mar 26, 2026
8caedf7
things are looking decent
Mar 27, 2026
4a6317d
logs should not be committed
Mar 27, 2026
05bee29
dont commit lgos
Mar 27, 2026
94ad908
amend
Mar 27, 2026
241e1db
simplify the submission folder
Mar 27, 2026
efd2b59
Add *.pt, *.ptz, *.wandb to .gitignore
Mar 27, 2026
186ef4d
it works.... but memory is slightly to high :(
Mar 29, 2026
0375751
great performance of 1.114
Mar 29, 2026
caa6e4d
changes to try to reduce 0.222 mb
Mar 29, 2026
a981c41
clean up submission format
Mar 29, 2026
fca62ae
changes
nestamidavaine Mar 29, 2026
e1764c3
Strip dead features (MTP, DTG, LAWA, bigram, VE, gated_attn, value_re…
nestamidavaine Mar 29, 2026
36924c0
yay?
Mar 30, 2026
66df5aa
change to 2 pass with slighly better performance
Apr 1, 2026
44722c4
Add recurrent depth with progressive pass growth + error feedback (no…
Apr 1, 2026
41aef30
Add tricks section to README: graph precompilation warmup and python-…
Apr 1, 2026
a1639d8
Fix submission.json author to nestamidavaine
Apr 1, 2026
47b74a3
Add per-seed results to submission.json
Apr 1, 2026
c0b02d3
Rename submission folder to Stable_Growing_Recurrance, update README …
Apr 1, 2026
947ae2f
Emphasize significant baseline beat under results table
Apr 1, 2026
c0db831
Add nats to baseline comparison
Apr 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .env
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
WANDB_API_KEY=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
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,10 @@ data/manifest.json
data/docs_selected.jsonl
.mypy_cache/
.venv
logs/
logs/
*.log
*.txt
!records/track_non_record_16mb/2026-03-26_Stable_Growing_Recurrance/*.log
*.pt
*.ptz
*.wandb
78 changes: 78 additions & 0 deletions ablation_3p_noRMS_j0.0.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
logs/46d7a1dd-24ad-4ebb-bfa3-b13f0ce61391.txt
val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=../../../data/tokenizers/fineweb_1024_bpe.model
train_loader:dataset:fineweb10B_sp1024 train_shards:10
val_loader:shards pattern=../../../data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
feedback: mode=diagonal rank=2 per_pass=False params=2560
recurrence: core_start=3 core_end=8 num_passes=3 stem=3 core=5 tail=3
model_params:26927199
mtp_num_heads:0 mtp_loss_weight:0.2 mtp_params:0
XSA:last_4 active_layers:[8, 9, 10]
world_size:1 grad_accum_steps:8
sdp_backends:cudnn=False flash=True mem_efficient=False math=False
attention_mode:gqa num_heads:8 num_kv_heads:4
tie_embeddings:True embed_lr:0.035 head_lr:0.0 matrix_lr:0.025 scalar_lr:0.025
train_batch_tokens:786432 train_seq_len:2048 iterations:50 warmup_steps:5 max_wallclock_seconds:900.000
seed:1337
wandb: [wandb.login()] Loaded credentials for https://api.wandb.ai from WANDB_API_KEY.
wandb: Currently logged in as: nesta-midavaine (propensity) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: WARNING Using a boolean value for 'reinit' is deprecated. Use 'return_previous' or 'finish_previous' instead.
wandb: setting up run xtlv4t52
wandb: Tracking run with wandb version 0.25.1
wandb: Run data is saved locally in /home/nesta/parameter-golf/records/track_10min_16mb/2026-03-26_RecurrentSOTA_Feedback/wandb/run-20260326_115924-xtlv4t52
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run ablation_3p_noRMS_j0.0
wandb: ⭐️ View project at https://wandb.ai/propensity/parameter-golf
wandb: 🚀 View run at https://wandb.ai/propensity/parameter-golf/runs/xtlv4t52
wandb:initialized
warmup_step:1/5
warmup_step:2/5
warmup_step:3/5
warmup_step:4/5
warmup_step:5/5
step:0/50 val_loss:6.9304 val_bpb:4.1046 train_time:0ms step_avg:0.01ms h_norms=['10160.2', '11311.6', '12555.0', '13896.3', '15332.5', '16950.8', '18657.9', '20459.0', '22380.8', '24439.6', '22273.3', '24354.9', '26573.0', '28946.4', '31482.6'] growth=['1.116', '1.113', '1.110', '1.107', '1.103', '1.106', '1.101', '1.097', '1.094', '1.092', '1.098', '1.093', '1.091', '1.089', '1.088']
step:1/50 train_loss:6.9310 train_time:2457ms step_avg:2456.70ms
step:2/50 train_loss:8.4480 train_time:4895ms step_avg:2447.55ms
step:3/50 train_loss:7.5656 train_time:7366ms step_avg:2455.22ms
step:4/50 train_loss:7.3715 train_time:9835ms step_avg:2458.84ms
step:5/50 train_loss:7.1882 train_time:12305ms step_avg:2460.94ms
step:6/50 train_loss:7.1200 train_time:14774ms step_avg:2462.35ms
step:7/50 train_loss:7.1275 train_time:17244ms step_avg:2463.46ms
step:8/50 train_loss:7.0234 train_time:19715ms step_avg:2464.42ms
step:9/50 train_loss:6.6287 train_time:22185ms step_avg:2465.05ms
step:10/50 train_loss:6.2775 train_time:24656ms step_avg:2465.57ms
step:20/50 train_loss:5.2073 train_time:49354ms step_avg:2467.72ms
step:25/50 val_loss:4.6001 val_bpb:2.7244 train_time:61743ms step_avg:2469.70ms h_norms=['18012.8', '20576.1', '23734.4', '27658.0', '32476.6', '39019.5', '47115.3', '57297.5', '70090.2', '85988.1', '66911.8', '82425.7', '101585.5', '125680.3', '156003.0'] growth=['1.133', '1.142', '1.153', '1.165', '1.174', '1.201', '1.207', '1.216', '1.223', '1.227', '1.228', '1.232', '1.232', '1.237', '1.241']
step:30/50 train_loss:4.3938 train_time:74062ms step_avg:2468.73ms
step:40/50 train_loss:4.0561 train_time:98772ms step_avg:2469.31ms
step:50/50 train_loss:3.8233 train_time:123613ms step_avg:2472.25ms
step:50/50 val_loss:3.7814 val_bpb:2.2396 train_time:123647ms step_avg:2472.93ms h_norms=['31577.0', '34240.9', '37755.1', '42362.9', '48395.2', '56432.3', '66325.1', '79064.0', '95012.0', '114485.4', '84419.3', '101309.9', '122596.9', '148869.6', '181094.1'] growth=['1.068', '1.084', '1.103', '1.122', '1.142', '1.166', '1.175', '1.192', '1.202', '1.205', '1.193', '1.200', '1.210', '1.214', '1.216']
peak memory allocated: 54207 MiB reserved: 55384 MiB
ema:applying EMA weights
DIAGNOSTIC post_ema val_loss:5.9416 val_bpb:3.5190 eval_time:67959ms
Serialized model: 106023671 bytes
Code size: 98931 bytes
Serialized model int6+lzma: 4809652 bytes
Total submission size int6+lzma: 4908583 bytes
final_int6_roundtrip val_loss:6.1789 val_bpb:3.6595 eval_time:67564ms
final_int6_roundtrip_exact val_loss:6.17886280 val_bpb:3.65947059
wandb: updating run metadata
wandb: uploading history steps 15-15, summary, console lines 30-31
wandb:
wandb: Run history:
wandb: lr_scale ▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb: step_avg_ms ▄▁▃▄▅▅▆▆▆▆▇▇▇█
wandb: train_loss ▆█▇▆▆▆▆▆▅▅▃▂▁▁
wandb: val_bpb █▃▁
wandb: val_loss █▃▁
wandb:
wandb: Run summary:
wandb: lr_scale 1
wandb: step_avg_ms 2472.25004
wandb: train_loss 3.82329
wandb: val_bpb 2.23957
wandb: val_loss 3.78142
wandb:
wandb: 🚀 View run ablation_3p_noRMS_j0.0 at: https://wandb.ai/propensity/parameter-golf/runs/xtlv4t52
wandb: ⭐️ View project at: https://wandb.ai/propensity/parameter-golf
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20260326_115924-xtlv4t52/logs
80 changes: 80 additions & 0 deletions ablation_3p_noRMS_j0.1.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
logs/4bd1dcea-262c-45fd-b47c-6c1070e31866.txt
val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=../../../data/tokenizers/fineweb_1024_bpe.model
train_loader:dataset:fineweb10B_sp1024 train_shards:10
val_loader:shards pattern=../../../data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
feedback: mode=diagonal rank=2 per_pass=False params=2560
recurrence: core_start=3 core_end=8 num_passes=3 stem=3 core=5 tail=3
model_params:26927199
mtp_num_heads:0 mtp_loss_weight:0.2 mtp_params:0
XSA:last_4 active_layers:[8, 9, 10]
world_size:1 grad_accum_steps:8
sdp_backends:cudnn=False flash=True mem_efficient=False math=False
attention_mode:gqa num_heads:8 num_kv_heads:4
tie_embeddings:True embed_lr:0.035 head_lr:0.0 matrix_lr:0.025 scalar_lr:0.025
train_batch_tokens:786432 train_seq_len:2048 iterations:50 warmup_steps:5 max_wallclock_seconds:900.000
seed:1337
wandb: [wandb.login()] Loaded credentials for https://api.wandb.ai from WANDB_API_KEY.
wandb: Currently logged in as: nesta-midavaine (propensity) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: WARNING Using a boolean value for 'reinit' is deprecated. Use 'return_previous' or 'finish_previous' instead.
wandb: setting up run 6rfmco93
wandb: Tracking run with wandb version 0.25.1
wandb: Run data is saved locally in /home/nesta/parameter-golf/records/track_10min_16mb/2026-03-26_RecurrentSOTA_Feedback/wandb/run-20260326_120745-6rfmco93
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run ablation_3p_noRMS_j0.1
wandb: ⭐️ View project at https://wandb.ai/propensity/parameter-golf
wandb: 🚀 View run at https://wandb.ai/propensity/parameter-golf/runs/6rfmco93
wandb:initialized
warmup_step:1/5
warmup_step:2/5
warmup_step:3/5
warmup_step:4/5
warmup_step:5/5
step:0/50 val_loss:6.9304 val_bpb:4.1046 train_time:0ms step_avg:0.02ms h_norms=['10143.6', '11247.2', '12436.8', '13721.2', '15093.2', '16638.6', '18260.9', '19974.1', '21802.2', '23760.8', '21655.4', '23628.3', '25729.0', '27978.2', '30382.6'] growth=['1.110', '1.109', '1.106', '1.103', '1.100', '1.102', '1.098', '1.094', '1.092', '1.090', '1.095', '1.091', '1.089', '1.087', '1.086']
step:1/50 train_loss:6.9310 train_time:2474ms step_avg:2473.98ms
step:2/50 train_loss:8.4480 train_time:4928ms step_avg:2463.85ms
step:3/50 train_loss:7.5657 train_time:7414ms step_avg:2471.28ms
step:4/50 train_loss:7.4125 train_time:9901ms step_avg:2475.24ms
step:5/50 train_loss:7.2581 train_time:12387ms step_avg:2477.37ms
step:6/50 train_loss:7.1563 train_time:14873ms step_avg:2478.80ms
step:7/50 train_loss:7.1205 train_time:17358ms step_avg:2479.79ms
step:8/50 train_loss:7.0021 train_time:19845ms step_avg:2480.59ms
step:9/50 train_loss:6.6191 train_time:22332ms step_avg:2481.32ms
step:10/50 train_loss:6.2241 train_time:24818ms step_avg:2481.82ms
step:20/50 train_loss:4.8854 train_time:49674ms step_avg:2483.72ms
step:25/50 val_loss:4.4102 val_bpb:2.6119 train_time:62144ms step_avg:2485.74ms h_norms=['12925.3', '12168.2', '11607.0', '11186.7', '10890.7', '10691.3', '10518.0', '10429.2', '10384.9', '10395.0', '10468.4', '10350.9', '10317.8', '10323.0', '10377.5'] growth=['0.930', '0.941', '0.954', '0.964', '0.974', '0.982', '0.984', '0.992', '0.996', '1.001', '0.987', '0.989', '0.997', '1.001', '1.005']
step:30/50 train_loss:4.2124 train_time:74549ms step_avg:2484.96ms
step:40/50 train_loss:3.9336 train_time:99426ms step_avg:2485.66ms
step:50/50 train_loss:3.7638 train_time:124432ms step_avg:2488.64ms
step:50/50 val_loss:3.7456 val_bpb:2.2184 train_time:124466ms step_avg:2489.33ms h_norms=['20394.8', '18235.1', '16671.4', '15574.6', '14825.8', '14555.8', '14297.6', '14121.5', '14031.0', '13984.3', '14335.7', '14174.8', '14069.5', '14035.7', '14026.9'] growth=['0.871', '0.894', '0.914', '0.934', '0.952', '0.982', '0.982', '0.988', '0.994', '0.997', '0.991', '0.989', '0.993', '0.998', '0.999']
peak memory allocated: 54399 MiB reserved: 55768 MiB
ema:applying EMA weights
DIAGNOSTIC post_ema val_loss:5.9394 val_bpb:3.5176 eval_time:68125ms
Serialized model: 106023671 bytes
Code size: 98931 bytes
Serialized model int6+lzma: 4804840 bytes
Total submission size int6+lzma: 4903771 bytes
final_int6_roundtrip val_loss:6.1350 val_bpb:3.6335 eval_time:67734ms
final_int6_roundtrip_exact val_loss:6.13503683 val_bpb:3.63351438
wandb: updating run metadata
wandb: uploading history steps 15-15, summary, console lines 30-31
wandb: uploading output.log; uploading wandb-summary.json
wandb: uploading data
wandb:
wandb: Run history:
wandb: lr_scale ▁▁▁▁▁▁▁▁▁▁▁▁▁▁
wandb: step_avg_ms ▄▁▃▄▅▅▅▆▆▆▇▇▇█
wandb: train_loss ▆█▇▆▆▆▆▆▅▅▃▂▁▁
wandb: val_bpb █▂▁
wandb: val_loss █▂▁
wandb:
wandb: Run summary:
wandb: lr_scale 1
wandb: step_avg_ms 2488.6422
wandb: train_loss 3.76377
wandb: val_bpb 2.21838
wandb: val_loss 3.74564
wandb:
wandb: 🚀 View run ablation_3p_noRMS_j0.1 at: https://wandb.ai/propensity/parameter-golf/runs/6rfmco93
wandb: ⭐️ View project at: https://wandb.ai/propensity/parameter-golf
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20260326_120745-6rfmco93/logs
5 changes: 5 additions & 0 deletions ablation_stdout.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
START 3-pass no-RMSnorm jac=0.0 (11:59:20)
DONE jac=0.0 => bpb@50=2.2396 int6=3.65947059 step=2472.25ms mem=54207MiB
START 3-pass no-RMSnorm jac=0.1 (12:07:41)
DONE jac=0.1 => bpb@50=2.2184 int6=3.63351438 step=2488.64ms mem=54399MiB
=== ABLATION COMPLETE (Thu Mar 26 12:16:04 UTC 2026) ===
44 changes: 44 additions & 0 deletions baseline_50step.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
logs/bff51f18-fbb9-43cc-9903-c84284e4e76d.txt
val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=../../../data/tokenizers/fineweb_1024_bpe.model
train_loader:dataset:fineweb10B_sp1024 train_shards:10
val_loader:shards pattern=../../../data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
model_params:26928220
mtp_num_heads:0 mtp_loss_weight:0.2 mtp_params:0
XSA:last_4 active_layers:[7, 8, 9, 10]
world_size:1 grad_accum_steps:8
sdp_backends:cudnn=False flash=True mem_efficient=False math=False
attention_mode:gqa num_heads:8 num_kv_heads:4
tie_embeddings:True embed_lr:0.035 head_lr:0.0 matrix_lr:0.025 scalar_lr:0.025
train_batch_tokens:786432 train_seq_len:2048 iterations:50 warmup_steps:5 max_wallclock_seconds:900.000
seed:1337
warmup_step:1/5
warmup_step:2/5
warmup_step:3/5
warmup_step:4/5
warmup_step:5/5
step:0/50 val_loss:6.9304 val_bpb:4.1046 train_time:0ms step_avg:0.02ms
step:1/50 train_loss:6.9310 train_time:1335ms step_avg:1334.89ms
step:2/50 train_loss:8.6894 train_time:2639ms step_avg:1319.33ms
step:3/50 train_loss:7.7641 train_time:3975ms step_avg:1325.02ms
step:4/50 train_loss:7.2309 train_time:5311ms step_avg:1327.85ms
step:5/50 train_loss:7.1292 train_time:6648ms step_avg:1329.55ms
step:6/50 train_loss:7.1698 train_time:7983ms step_avg:1330.57ms
step:7/50 train_loss:7.1045 train_time:9320ms step_avg:1331.38ms
step:8/50 train_loss:6.9776 train_time:10656ms step_avg:1331.99ms
step:9/50 train_loss:6.6169 train_time:11993ms step_avg:1332.53ms
step:10/50 train_loss:6.2604 train_time:13330ms step_avg:1332.96ms
step:20/50 train_loss:5.1681 train_time:26695ms step_avg:1334.74ms
step:25/50 val_loss:4.6120 val_bpb:2.7315 train_time:33413ms step_avg:1336.54ms
step:30/50 train_loss:4.3901 train_time:40068ms step_avg:1335.60ms
step:40/50 train_loss:4.0167 train_time:53443ms step_avg:1336.07ms
step:50/50 train_loss:3.8262 train_time:66820ms step_avg:1336.40ms
step:50/50 val_loss:3.7856 val_bpb:2.2421 train_time:66853ms step_avg:1337.06ms
peak memory allocated: 30083 MiB reserved: 31168 MiB
ema:applying EMA weights
DIAGNOSTIC post_ema val_loss:5.8987 val_bpb:3.4935 eval_time:38419ms
Serialized model: 106027446 bytes
Code size: 89458 bytes
Serialized model int6+lzma: 4809376 bytes
Total submission size int6+lzma: 4898834 bytes
final_int6_roundtrip val_loss:6.0576 val_bpb:3.5876 eval_time:38209ms
final_int6_roundtrip_exact val_loss:6.05759208 val_bpb:3.58764724
35 changes: 35 additions & 0 deletions baseline_stdout.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
START full run: 4-pass baseline (no LoRA) TTT SWA, 80min (Thu Mar 26 23:01:24 UTC 2026)

=== FINAL RESULTS ===
stopping_early: wallclock_cap train_time:4800814ms step:3456/20000
peak memory allocated: 50545 MiB reserved: 50594 MiB
final_int6_roundtrip_exact val_loss:1.95735252 val_bpb:1.15925441
final_int6_sliding_window_exact val_loss:1.91642779 val_bpb:1.13501949
legal_ttt_exact val_loss:1.91163996 val_bpb:1.13218386
FINISHED (Fri Mar 27 01:46:16 UTC 2026)
python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1263, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/nesta/parameter-golf/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 656, in __call__
return self.current_callable(inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nesta/parameter-golf/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3401, in run
out = model(new_inputs)
^^^^^^^^^^^^^^^^^
File "/tmp/torchinductor_nesta/wm/cwmtq2g54vzhxzvsc4odvxx7srlroi2tosqcftjhmyw2c637ogrq.py", line 12255, in call
buf54 = empty_strided_cuda((48, 2048, 512), (1048576, 512, 1), torch.bfloat16)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU 0 has a total capacity of 139.80 GiB of which 18.69 MiB is free. Process 448538 has 758.00 MiB memory in use. Process 448539 has 758.00 MiB memory in use. Process 448534 has 51.16 GiB memory in use. Process 448547 has 38.69 GiB memory in use. Including non-PyTorch memory, this process has 48.42 GiB memory in use. Of the allocated memory 47.76 GiB is allocated by PyTorch, and 3.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)
wandb:
wandb: 🚀 View run full_4pass_baseline_80min at: https://wandb.ai/propensity/parameter-golf/runs/jkh80zal
wandb: Find logs at: wandb/run-20260326_230158-jkh80zal/logs
wandb:
wandb: 🚀 View run full_4pass_baseline_80min at: https://wandb.ai/propensity/parameter-golf/runs/fsi4c82a
wandb: Find logs at: wandb/run-20260326_230158-fsi4c82a/logs
wandb:
wandb: 🚀 View run full_4pass_baseline_80min at: https://wandb.ai/propensity/parameter-golf/runs/43bipylb
wandb: Find logs at: wandb/run-20260326_230158-43bipylb/logs
wandb:
wandb: 🚀 View run full_4pass_baseline_80min at: https://wandb.ai/propensity/parameter-golf/runs/zcabiozu
wandb: Find logs at: wandb/run-20260326_230158-zcabiozu/logs
FINISHED (Thu Mar 26 23:02:28 UTC 2026)
Loading