Skip to content

Commit ba632fe

Browse files
Minor fix to make adafactor work for >2d conv kernels (facebookresearch#1122)
Summary: missing .unsqueeze(-1) in line 124, without this change we'll encounter runtime error for >2d convolutional kernels, with this fix, we're applying adafactor's 2d logic to the two final dimensions. Pull Request resolved: facebookresearch#1122 Differential Revision: D17431662 Pulled By: myleott fbshipit-source-id: e7435e77270a9252f75f01b2457ef0048f5bcf36
1 parent 48902f0 commit ba632fe

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

fairseq/optim/adafactor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ def _rms(self, tensor):
121121
return tensor.norm(2) / (tensor.numel() ** 0.5)
122122

123123
def _approx_sq_grad(self, exp_avg_sq_row, exp_avg_sq_col, output):
124-
r_factor = (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1)).rsqrt_().unsqueeze(-1)
124+
r_factor = (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1).unsqueeze(-1)).rsqrt_().unsqueeze(-1)
125125
c_factor = exp_avg_sq_col.unsqueeze(-2).rsqrt()
126126
torch.mul(r_factor, c_factor, out=output)
127127

0 commit comments

Comments
 (0)