You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, training multiple models with Deepspeed's ZeRO optimization does not work as intended. A detailed issue has been reported and can be found here: huggingface/transformers#22705
sp Describe the solution you'd like
A clear and concise description of what you want to happen.
Thanks for @stas00 helping : #3076
When Deepspeed ZeRO-3 is used in examples/text_to_image/train_text_to_image.py the 3 models (text_encoder, vae, unet) get partitioned with zero.Init but only unet gets accelerate.prepare so Deepspeed doesn't know to automatically gather the already partitioned weights before each forward call for text_encoder and vae. That's the source of the problem.
Currently Accelerate doesn't know how to handle multiple models.
For this to work properly all models must be run through accelerate.prepare. But Accelerate will try to assign the same optimizer with the same weights to all models during deepspeed.initialize, which of course doesn't work. It needs to assign the correct weights to the correct optimizer, so that the param_groups are populated with the weights that are inside the model itself. In the case of frozen models there should be no optimizer at all (or param_groups should be empty).
I would like to see Accelerate support training multiple models by preparing all models and assigning different weights and optimizers for each model. This will enable efficient training of multiple models and help address issues such as the one discussed in pull request #3076 .
In summary, Accelerate needs to be able to prepare all models and assign different weights and optimizers for efficient training of multiple models.
Describe alternatives you've considered
if the proposed enhancement in Accelerate to support training of multiple models is implemented, it could improve the training speed and efficiency of models like ChatLLM. Further optimizing it with DeepSpeed could lead to even better results in terms of training speed and memory usage.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently, training multiple models with Deepspeed's ZeRO optimization does not work as intended. A detailed issue has been reported and can be found here: huggingface/transformers#22705
sp
Describe the solution you'd like
A clear and concise description of what you want to happen.
Thanks for @stas00 helping : #3076
I would like to see Accelerate support training multiple models by preparing all models and assigning different weights and optimizers for each model. This will enable efficient training of multiple models and help address issues such as the one discussed in pull request #3076 .
In summary, Accelerate needs to be able to prepare all models and assign different weights and optimizers for efficient training of multiple models.
Describe alternatives you've considered
if the proposed enhancement in Accelerate to support training of multiple models is implemented, it could improve the training speed and efficiency of models like ChatLLM. Further optimizing it with DeepSpeed could lead to even better results in terms of training speed and memory usage.
The text was updated successfully, but these errors were encountered: