fix: model.set_requires_gradient_sync(False) should be called to turn off gradient synchronization in FSDP2#3762
Conversation
…rn off gradient synchronization in FSDP2.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@EquationWalker this did not happen before because everything inside the function silently fails through when whatever they're looking for isn't there - see the lines for DeepSpeed ZeRO 3 check earlier where it uses not sure that it helps anyone to know this, or if some earlier check should occur when a user passes an unpacked list to |
I think this check should occur in |
In FSDP2, the model(
FSDPModule) does not haveno_sync()and instead callsmodel.set_requires_gradient_sync(False)to turn off gradient synchronization. See at torch.distributed.fsdp.FSDPModule.set_requires_gradient_sync