openai · nestamidavaine · Mar 25, 2026 · Mar 25, 2026 · Mar 26, 2026 · Mar 27, 2026
diff --git a/.env b/.env
@@ -0,0 +1 @@
+WANDB_API_KEY=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
diff --git a/.gitignore b/.gitignore
@@ -8,4 +8,10 @@ data/manifest.json
 data/docs_selected.jsonl
 .mypy_cache/
 .venv
-logs/
+logs/
+*.log
+*.txt
+!records/track_non_record_16mb/2026-03-26_Stable_Growing_Recurrance/*.log
+*.pt
+*.ptz
+*.wandb
diff --git a/ablation_3p_noRMS_j0.0.log b/ablation_3p_noRMS_j0.0.log
@@ -0,0 +1,78 @@
+logs/46d7a1dd-24ad-4ebb-bfa3-b13f0ce61391.txt
+val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=../../../data/tokenizers/fineweb_1024_bpe.model
+train_loader:dataset:fineweb10B_sp1024 train_shards:10
+val_loader:shards pattern=../../../data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
+feedback: mode=diagonal rank=2 per_pass=False params=2560
+recurrence: core_start=3 core_end=8 num_passes=3 stem=3 core=5 tail=3
+model_params:26927199
+mtp_num_heads:0 mtp_loss_weight:0.2 mtp_params:0
+XSA:last_4 active_layers:[8, 9, 10]
+world_size:1 grad_accum_steps:8
+sdp_backends:cudnn=False flash=True mem_efficient=False math=False
+attention_mode:gqa num_heads:8 num_kv_heads:4
+tie_embeddings:True embed_lr:0.035 head_lr:0.0 matrix_lr:0.025 scalar_lr:0.025
+train_batch_tokens:786432 train_seq_len:2048 iterations:50 warmup_steps:5 max_wallclock_seconds:900.000
+seed:1337
+wandb: [wandb.login()] Loaded credentials for https://api.wandb.ai from WANDB_API_KEY.
+wandb: Currently logged in as: nesta-midavaine (propensity) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
+wandb: WARNING Using a boolean value for 'reinit' is deprecated. Use 'return_previous' or 'finish_previous' instead.
+wandb: setting up run xtlv4t52
+wandb: Tracking run with wandb version 0.25.1
+wandb: Run data is saved locally in /home/nesta/parameter-golf/records/track_10min_16mb/2026-03-26_RecurrentSOTA_Feedback/wandb/run-20260326_115924-xtlv4t52
+wandb: Run `wandb offline` to turn off syncing.
+wandb: Syncing run ablation_3p_noRMS_j0.0
+wandb: ⭐️ View project at https://wandb.ai/propensity/parameter-golf
+wandb: 🚀 View run at https://wandb.ai/propensity/parameter-golf/runs/xtlv4t52
+wandb:initialized
+warmup_step:1/5
+warmup_step:2/5
+warmup_step:3/5
+warmup_step:4/5
+warmup_step:5/5
+step:0/50 val_loss:6.9304 val_bpb:4.1046 train_time:0ms step_avg:0.01ms h_norms=['10160.2', '11311.6', '12555.0', '13896.3', '15332.5', '16950.8', '18657.9', '20459.0', '22380.8', '24439.6', '22273.3', '24354.9', '26573.0', '28946.4', '31482.6'] growth=['1.116', '1.113', '1.110', '1.107', '1.103', '1.106', '1.101', '1.097', '1.094', '1.092', '1.098', '1.093', '1.091', '1.089', '1.088']
+step:1/50 train_loss:6.9310 train_time:2457ms step_avg:2456.70ms
+step:2/50 train_loss:8.4480 train_time:4895ms step_avg:2447.55ms
+step:3/50 train_loss:7.5656 train_time:7366ms step_avg:2455.22ms
+step:4/50 train_loss:7.3715 train_time:9835ms step_avg:2458.84ms
+step:5/50 train_loss:7.1882 train_time:12305ms step_avg:2460.94ms
+step:6/50 train_loss:7.1200 train_time:14774ms step_avg:2462.35ms
+step:7/50 train_loss:7.1275 train_time:17244ms step_avg:2463.46ms
+step:8/50 train_loss:7.0234 train_time:19715ms step_avg:2464.42ms
+step:9/50 train_loss:6.6287 train_time:22185ms step_avg:2465.05ms
+step:10/50 train_loss:6.2775 train_time:24656ms step_avg:2465.57ms
+step:20/50 train_loss:5.2073 train_time:49354ms step_avg:2467.72ms
+step:25/50 val_loss:4.6001 val_bpb:2.7244 train_time:61743ms step_avg:2469.70ms h_norms=['18012.8', '20576.1', '23734.4', '27658.0', '32476.6', '39019.5', '47115.3', '57297.5', '70090.2', '85988.1', '66911.8', '82425.7', '101585.5', '125680.3', '156003.0'] growth=['1.133', '1.142', '1.153', '1.165', '1.174', '1.201', '1.207', '1.216', '1.223', '1.227', '1.228', '1.232', '1.232', '1.237', '1.241']
+step:30/50 train_loss:4.3938 train_time:74062ms step_avg:2468.73ms
+step:40/50 train_loss:4.0561 train_time:98772ms step_avg:2469.31ms
+step:50/50 train_loss:3.8233 train_time:123613ms step_avg:2472.25ms
+step:50/50 val_loss:3.7814 val_bpb:2.2396 train_time:123647ms step_avg:2472.93ms h_norms=['31577.0', '34240.9', '37755.1', '42362.9', '48395.2', '56432.3', '66325.1', '79064.0', '95012.0', '114485.4', '84419.3', '101309.9', '122596.9', '148869.6', '181094.1'] growth=['1.068', '1.084', '1.103', '1.122', '1.142', '1.166', '1.175', '1.192', '1.202', '1.205', '1.193', '1.200', '1.210', '1.214', '1.216']
+peak memory allocated: 54207 MiB reserved: 55384 MiB
+ema:applying EMA weights
+DIAGNOSTIC post_ema val_loss:5.9416 val_bpb:3.5190 eval_time:67959ms
+Serialized model: 106023671 bytes
+Code size: 98931 bytes
+Serialized model int6+lzma: 4809652 bytes
+Total submission size int6+lzma: 4908583 bytes
+final_int6_roundtrip val_loss:6.1789 val_bpb:3.6595 eval_time:67564ms
+final_int6_roundtrip_exact val_loss:6.17886280 val_bpb:3.65947059
+wandb: updating run metadata
+wandb: uploading history steps 15-15, summary, console lines 30-31
+wandb: 
+wandb: Run history:
+wandb:    lr_scale ▁▁▁▁▁▁▁▁▁▁▁▁▁▁
+wandb: step_avg_ms ▄▁▃▄▅▅▆▆▆▆▇▇▇█
+wandb:  train_loss ▆█▇▆▆▆▆▆▅▅▃▂▁▁
+wandb:     val_bpb █▃▁
+wandb:    val_loss █▃▁
+wandb: 
+wandb: Run summary:
+wandb:    lr_scale 1
+wandb: step_avg_ms 2472.25004
+wandb:  train_loss 3.82329
+wandb:     val_bpb 2.23957
+wandb:    val_loss 3.78142
+wandb: 
+wandb: 🚀 View run ablation_3p_noRMS_j0.0 at: https://wandb.ai/propensity/parameter-golf/runs/xtlv4t52
+wandb: ⭐️ View project at: https://wandb.ai/propensity/parameter-golf
+wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
+wandb: Find logs at: ./wandb/run-20260326_115924-xtlv4t52/logs
diff --git a/ablation_3p_noRMS_j0.1.log b/ablation_3p_noRMS_j0.1.log
@@ -0,0 +1,80 @@
+logs/4bd1dcea-262c-45fd-b47c-6c1070e31866.txt
+val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=../../../data/tokenizers/fineweb_1024_bpe.model
+train_loader:dataset:fineweb10B_sp1024 train_shards:10
+val_loader:shards pattern=../../../data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
+feedback: mode=diagonal rank=2 per_pass=False params=2560
+recurrence: core_start=3 core_end=8 num_passes=3 stem=3 core=5 tail=3
+model_params:26927199
+mtp_num_heads:0 mtp_loss_weight:0.2 mtp_params:0
+XSA:last_4 active_layers:[8, 9, 10]
+world_size:1 grad_accum_steps:8
+sdp_backends:cudnn=False flash=True mem_efficient=False math=False
+attention_mode:gqa num_heads:8 num_kv_heads:4
+tie_embeddings:True embed_lr:0.035 head_lr:0.0 matrix_lr:0.025 scalar_lr:0.025
+train_batch_tokens:786432 train_seq_len:2048 iterations:50 warmup_steps:5 max_wallclock_seconds:900.000
+seed:1337
+wandb: [wandb.login()] Loaded credentials for https://api.wandb.ai from WANDB_API_KEY.
+wandb: Currently logged in as: nesta-midavaine (propensity) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
+wandb: WARNING Using a boolean value for 'reinit' is deprecated. Use 'return_previous' or 'finish_previous' instead.
+wandb: setting up run 6rfmco93
+wandb: Tracking run with wandb version 0.25.1
+wandb: Run data is saved locally in /home/nesta/parameter-golf/records/track_10min_16mb/2026-03-26_RecurrentSOTA_Feedback/wandb/run-20260326_120745-6rfmco93
+wandb: Run `wandb offline` to turn off syncing.
+wandb: Syncing run ablation_3p_noRMS_j0.1
+wandb: ⭐️ View project at https://wandb.ai/propensity/parameter-golf
+wandb: 🚀 View run at https://wandb.ai/propensity/parameter-golf/runs/6rfmco93
+wandb:initialized
+warmup_step:1/5
+warmup_step:2/5
+warmup_step:3/5
+warmup_step:4/5
+warmup_step:5/5
+step:0/50 val_loss:6.9304 val_bpb:4.1046 train_time:0ms step_avg:0.02ms h_norms=['10143.6', '11247.2', '12436.8', '13721.2', '15093.2', '16638.6', '18260.9', '19974.1', '21802.2', '23760.8', '21655.4', '23628.3', '25729.0', '27978.2', '30382.6'] growth=['1.110', '1.109', '1.106', '1.103', '1.100', '1.102', '1.098', '1.094', '1.092', '1.090', '1.095', '1.091', '1.089', '1.087', '1.086']
+step:1/50 train_loss:6.9310 train_time:2474ms step_avg:2473.98ms
+step:2/50 train_loss:8.4480 train_time:4928ms step_avg:2463.85ms
+step:3/50 train_loss:7.5657 train_time:7414ms step_avg:2471.28ms
+step:4/50 train_loss:7.4125 train_time:9901ms step_avg:2475.24ms
+step:5/50 train_loss:7.2581 train_time:12387ms step_avg:2477.37ms
+step:6/50 train_loss:7.1563 train_time:14873ms step_avg:2478.80ms
+step:7/50 train_loss:7.1205 train_time:17358ms step_avg:2479.79ms
+step:8/50 train_loss:7.0021 train_time:19845ms step_avg:2480.59ms
+step:9/50 train_loss:6.6191 train_time:22332ms step_avg:2481.32ms
+step:10/50 train_loss:6.2241 train_time:24818ms step_avg:2481.82ms
+step:20/50 train_loss:4.8854 train_time:49674ms step_avg:2483.72ms
+step:25/50 val_loss:4.4102 val_bpb:2.6119 train_time:62144ms step_avg:2485.74ms h_norms=['12925.3', '12168.2', '11607.0', '11186.7', '10890.7', '10691.3', '10518.0', '10429.2', '10384.9', '10395.0', '10468.4', '10350.9', '10317.8', '10323.0', '10377.5'] growth=['0.930', '0.941', '0.954', '0.964', '0.974', '0.982', '0.984', '0.992', '0.996', '1.001', '0.987', '0.989', '0.997', '1.001', '1.005']
+step:30/50 train_loss:4.2124 train_time:74549ms step_avg:2484.96ms
+step:40/50 train_loss:3.9336 train_time:99426ms step_avg:2485.66ms
+step:50/50 train_loss:3.7638 train_time:124432ms step_avg:2488.64ms
+step:50/50 val_loss:3.7456 val_bpb:2.2184 train_time:124466ms step_avg:2489.33ms h_norms=['20394.8', '18235.1', '16671.4', '15574.6', '14825.8', '14555.8', '14297.6', '14121.5', '14031.0', '13984.3', '14335.7', '14174.8', '14069.5', '14035.7', '14026.9'] growth=['0.871', '0.894', '0.914', '0.934', '0.952', '0.982', '0.982', '0.988', '0.994', '0.997', '0.991', '0.989', '0.993', '0.998', '0.999']
+peak memory allocated: 54399 MiB reserved: 55768 MiB
+ema:applying EMA weights
+DIAGNOSTIC post_ema val_loss:5.9394 val_bpb:3.5176 eval_time:68125ms
+Serialized model: 106023671 bytes
+Code size: 98931 bytes
+Serialized model int6+lzma: 4804840 bytes
+Total submission size int6+lzma: 4903771 bytes
+final_int6_roundtrip val_loss:6.1350 val_bpb:3.6335 eval_time:67734ms
+final_int6_roundtrip_exact val_loss:6.13503683 val_bpb:3.63351438
+wandb: updating run metadata
+wandb: uploading history steps 15-15, summary, console lines 30-31
+wandb: uploading output.log; uploading wandb-summary.json
+wandb: uploading data
+wandb: 
+wandb: Run history:
+wandb:    lr_scale ▁▁▁▁▁▁▁▁▁▁▁▁▁▁
+wandb: step_avg_ms ▄▁▃▄▅▅▅▆▆▆▇▇▇█
+wandb:  train_loss ▆█▇▆▆▆▆▆▅▅▃▂▁▁
+wandb:     val_bpb █▂▁
+wandb:    val_loss █▂▁
+wandb: 
+wandb: Run summary:
+wandb:    lr_scale 1
+wandb: step_avg_ms 2488.6422
+wandb:  train_loss 3.76377
+wandb:     val_bpb 2.21838
+wandb:    val_loss 3.74564
+wandb: 
+wandb: 🚀 View run ablation_3p_noRMS_j0.1 at: https://wandb.ai/propensity/parameter-golf/runs/6rfmco93
+wandb: ⭐️ View project at: https://wandb.ai/propensity/parameter-golf
+wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
+wandb: Find logs at: ./wandb/run-20260326_120745-6rfmco93/logs
diff --git a/ablation_stdout.log b/ablation_stdout.log
@@ -0,0 +1,5 @@
+START 3-pass no-RMSnorm jac=0.0  (11:59:20)
+DONE jac=0.0 => bpb@50=2.2396 int6=3.65947059 step=2472.25ms mem=54207MiB
+START 3-pass no-RMSnorm jac=0.1  (12:07:41)
+DONE jac=0.1 => bpb@50=2.2184 int6=3.63351438 step=2488.64ms mem=54399MiB
+=== ABLATION COMPLETE (Thu Mar 26 12:16:04 UTC 2026) ===
diff --git a/baseline_50step.log b/baseline_50step.log
@@ -0,0 +1,44 @@
+logs/bff51f18-fbb9-43cc-9903-c84284e4e76d.txt
+val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=../../../data/tokenizers/fineweb_1024_bpe.model
+train_loader:dataset:fineweb10B_sp1024 train_shards:10
+val_loader:shards pattern=../../../data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
+model_params:26928220
+mtp_num_heads:0 mtp_loss_weight:0.2 mtp_params:0
+XSA:last_4 active_layers:[7, 8, 9, 10]
+world_size:1 grad_accum_steps:8
+sdp_backends:cudnn=False flash=True mem_efficient=False math=False
+attention_mode:gqa num_heads:8 num_kv_heads:4
+tie_embeddings:True embed_lr:0.035 head_lr:0.0 matrix_lr:0.025 scalar_lr:0.025
+train_batch_tokens:786432 train_seq_len:2048 iterations:50 warmup_steps:5 max_wallclock_seconds:900.000
+seed:1337
+warmup_step:1/5
+warmup_step:2/5
+warmup_step:3/5
+warmup_step:4/5
+warmup_step:5/5
+step:0/50 val_loss:6.9304 val_bpb:4.1046 train_time:0ms step_avg:0.02ms
+step:1/50 train_loss:6.9310 train_time:1335ms step_avg:1334.89ms
+step:2/50 train_loss:8.6894 train_time:2639ms step_avg:1319.33ms
+step:3/50 train_loss:7.7641 train_time:3975ms step_avg:1325.02ms
+step:4/50 train_loss:7.2309 train_time:5311ms step_avg:1327.85ms
+step:5/50 train_loss:7.1292 train_time:6648ms step_avg:1329.55ms
+step:6/50 train_loss:7.1698 train_time:7983ms step_avg:1330.57ms
+step:7/50 train_loss:7.1045 train_time:9320ms step_avg:1331.38ms
+step:8/50 train_loss:6.9776 train_time:10656ms step_avg:1331.99ms
+step:9/50 train_loss:6.6169 train_time:11993ms step_avg:1332.53ms
+step:10/50 train_loss:6.2604 train_time:13330ms step_avg:1332.96ms
+step:20/50 train_loss:5.1681 train_time:26695ms step_avg:1334.74ms
+step:25/50 val_loss:4.6120 val_bpb:2.7315 train_time:33413ms step_avg:1336.54ms
+step:30/50 train_loss:4.3901 train_time:40068ms step_avg:1335.60ms
+step:40/50 train_loss:4.0167 train_time:53443ms step_avg:1336.07ms
+step:50/50 train_loss:3.8262 train_time:66820ms step_avg:1336.40ms
+step:50/50 val_loss:3.7856 val_bpb:2.2421 train_time:66853ms step_avg:1337.06ms
+peak memory allocated: 30083 MiB reserved: 31168 MiB
+ema:applying EMA weights
+DIAGNOSTIC post_ema val_loss:5.8987 val_bpb:3.4935 eval_time:38419ms
+Serialized model: 106027446 bytes
+Code size: 89458 bytes
+Serialized model int6+lzma: 4809376 bytes
+Total submission size int6+lzma: 4898834 bytes
+final_int6_roundtrip val_loss:6.0576 val_bpb:3.5876 eval_time:38209ms
+final_int6_roundtrip_exact val_loss:6.05759208 val_bpb:3.58764724
diff --git a/baseline_stdout.log b/baseline_stdout.log
@@ -0,0 +1,35 @@
+START full run: 4-pass baseline (no LoRA) TTT SWA, 80min (Thu Mar 26 23:01:24 UTC 2026)
+
+=== FINAL RESULTS ===
+stopping_early: wallclock_cap train_time:4800814ms step:3456/20000
+peak memory allocated: 50545 MiB reserved: 50594 MiB
+final_int6_roundtrip_exact val_loss:1.95735252 val_bpb:1.15925441
+final_int6_sliding_window_exact val_loss:1.91642779 val_bpb:1.13501949
+legal_ttt_exact val_loss:1.91163996 val_bpb:1.13218386
+FINISHED (Fri Mar 27 01:46:16 UTC 2026)
+python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1263, in _fn
+    return fn(*args, **kwargs)
+           ^^^^^^^^^^^^^^^^^^^
+  File "/home/nesta/parameter-golf/.venv/lib/python3.12/site-packages/torch/_inductor/output_code.py", line 656, in __call__
+    return self.current_callable(inputs)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/nesta/parameter-golf/.venv/lib/python3.12/site-packages/torch/_inductor/utils.py", line 3401, in run
+    out = model(new_inputs)
+          ^^^^^^^^^^^^^^^^^
+  File "/tmp/torchinductor_nesta/wm/cwmtq2g54vzhxzvsc4odvxx7srlroi2tosqcftjhmyw2c637ogrq.py", line 12255, in call
+    buf54 = empty_strided_cuda((48, 2048, 512), (1048576, 512, 1), torch.bfloat16)
+            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB. GPU 0 has a total capacity of 139.80 GiB of which 18.69 MiB is free. Process 448538 has 758.00 MiB memory in use. Process 448539 has 758.00 MiB memory in use. Process 448534 has 51.16 GiB memory in use. Process 448547 has 38.69 GiB memory in use. Including non-PyTorch memory, this process has 48.42 GiB memory in use. Of the allocated memory 47.76 GiB is allocated by PyTorch, and 3.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)
+[1;34mwandb[0m: 
+[1;34mwandb[0m: 🚀 View run [33mfull_4pass_baseline_80min[0m at: [34mhttps://wandb.ai/propensity/parameter-golf/runs/jkh80zal[0m
+[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20260326_230158-jkh80zal/logs[0m
+[1;34mwandb[0m: 
+[1;34mwandb[0m: 🚀 View run [33mfull_4pass_baseline_80min[0m at: [34mhttps://wandb.ai/propensity/parameter-golf/runs/fsi4c82a[0m
+[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20260326_230158-fsi4c82a/logs[0m
+[1;34mwandb[0m: 
+[1;34mwandb[0m: 🚀 View run [33mfull_4pass_baseline_80min[0m at: [34mhttps://wandb.ai/propensity/parameter-golf/runs/43bipylb[0m
+[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20260326_230158-43bipylb/logs[0m
+[1;34mwandb[0m: 
+[1;34mwandb[0m: 🚀 View run [33mfull_4pass_baseline_80min[0m at: [34mhttps://wandb.ai/propensity/parameter-golf/runs/zcabiozu[0m
+[1;34mwandb[0m: Find logs at: [1;35mwandb/run-20260326_230158-zcabiozu/logs[0m
+FINISHED (Thu Mar 26 23:02:28 UTC 2026)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		WANDB_API_KEY=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