Commit 874bc3e
committed
[Perf] AdamWConfig: enable fused AdamW kernel
Pass ``fused=True`` to ``torch.optim.AdamW`` on the non-foreach path. The fused
kernel folds the per-parameter element-wise AdamW update into a single CUDA
launch, cutting launch overhead and optimizer-step time at large parameter
counts.1 parent 8a0efa8 commit 874bc3e
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | | - | |
| 66 | + | |
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| |||
0 commit comments