Commit 99d9970
Weight decay on Muon (MUON_WEIGHT_DECAY=0.09 default in frontier run)
Frontier records (PR openai#1285 MuonEq-R + WD=0.090, PR openai#1218 WD=0.085) use
AdamW-style decoupled weight decay on the Muon optimizer. Add the knob
with default 0.0 (backward-compatible). Applied as
p.data.mul_(1 - lr * wd) before the Muon matrix update.
MuonEq-R (row-normalized) variant is not ported — it would need more
line budget than we have on this branch. WD alone accounts for the
majority of that record's improvement per the commit notes.
dev/run_frontier.sh sets MUON_WEIGHT_DECAY=0.09 by default.
Also inlined restore_low_dim_params_to_fp32 at its single call site
to free lines for this change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 0794f04 commit 99d9970
3 files changed
Lines changed: 23 additions & 30 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| |||
Lines changed: 11 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
| 126 | + | |
126 | 127 | | |
127 | 128 | | |
128 | 129 | | |
| |||
155 | 156 | | |
156 | 157 | | |
157 | 158 | | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
| 159 | + | |
| 160 | + | |
163 | 161 | | |
164 | 162 | | |
165 | 163 | | |
| |||
204 | 202 | | |
205 | 203 | | |
206 | 204 | | |
| 205 | + | |
207 | 206 | | |
208 | 207 | | |
209 | 208 | | |
| 209 | + | |
| 210 | + | |
210 | 211 | | |
211 | 212 | | |
212 | 213 | | |
| |||
763 | 764 | | |
764 | 765 | | |
765 | 766 | | |
766 | | - | |
767 | | - | |
768 | | - | |
769 | | - | |
770 | | - | |
771 | | - | |
772 | | - | |
773 | | - | |
774 | 767 | | |
775 | 768 | | |
776 | 769 | | |
| |||
1124 | 1117 | | |
1125 | 1118 | | |
1126 | 1119 | | |
1127 | | - | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
1128 | 1124 | | |
1129 | 1125 | | |
1130 | 1126 | | |
| |||
1157 | 1153 | | |
1158 | 1154 | | |
1159 | 1155 | | |
1160 | | - | |
| 1156 | + | |
1161 | 1157 | | |
1162 | 1158 | | |
1163 | 1159 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
| 126 | + | |
126 | 127 | | |
127 | 128 | | |
128 | 129 | | |
| |||
155 | 156 | | |
156 | 157 | | |
157 | 158 | | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
| 159 | + | |
| 160 | + | |
163 | 161 | | |
164 | 162 | | |
165 | 163 | | |
| |||
204 | 202 | | |
205 | 203 | | |
206 | 204 | | |
| 205 | + | |
207 | 206 | | |
208 | 207 | | |
209 | 208 | | |
| 209 | + | |
| 210 | + | |
210 | 211 | | |
211 | 212 | | |
212 | 213 | | |
| |||
763 | 764 | | |
764 | 765 | | |
765 | 766 | | |
766 | | - | |
767 | | - | |
768 | | - | |
769 | | - | |
770 | | - | |
771 | | - | |
772 | | - | |
773 | | - | |
774 | 767 | | |
775 | 768 | | |
776 | 769 | | |
| |||
1124 | 1117 | | |
1125 | 1118 | | |
1126 | 1119 | | |
1127 | | - | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
1128 | 1124 | | |
1129 | 1125 | | |
1130 | 1126 | | |
| |||
1157 | 1153 | | |
1158 | 1154 | | |
1159 | 1155 | | |
1160 | | - | |
| 1156 | + | |
1161 | 1157 | | |
1162 | 1158 | | |
1163 | 1159 | | |
| |||
0 commit comments