Support assigning different optimizers with weights to multiple models during deepspeed.initialize in Accelerate #3098

uygnef · 2023-04-14T02:56:03Z

Is your feature request related to a problem? Please describe.

Currently, training multiple models with Deepspeed's ZeRO optimization does not work as intended. A detailed issue has been reported and can be found here: huggingface/transformers#22705

sp
Describe the solution you'd like
A clear and concise description of what you want to happen.
Thanks for @stas00 helping : #3076

When Deepspeed ZeRO-3 is used in examples/text_to_image/train_text_to_image.py the 3 models (text_encoder, vae, unet) get partitioned with zero.Init but only unet gets accelerate.prepare so Deepspeed doesn't know to automatically gather the already partitioned weights before each forward call for text_encoder and vae. That's the source of the problem.

Currently Accelerate doesn't know how to handle multiple models.

For this to work properly all models must be run through accelerate.prepare. But Accelerate will try to assign the same optimizer with the same weights to all models during deepspeed.initialize, which of course doesn't work. It needs to assign the correct weights to the correct optimizer, so that the param_groups are populated with the weights that are inside the model itself. In the case of frozen models there should be no optimizer at all (or param_groups should be empty).

I would like to see Accelerate support training multiple models by preparing all models and assigning different weights and optimizers for each model. This will enable efficient training of multiple models and help address issues such as the one discussed in pull request #3076 .
In summary, Accelerate needs to be able to prepare all models and assign different weights and optimizers for efficient training of multiple models.

Describe alternatives you've considered
if the proposed enhancement in Accelerate to support training of multiple models is implemented, it could improve the training speed and efficiency of models like ChatLLM. Further optimizing it with DeepSpeed could lead to even better results in terms of training speed and memory usage.

sayakpaul · 2023-04-17T12:25:20Z

This seems like an accelerate feature request and not a diffusers feature request.

Cc: @williamberman

uygnef · 2023-04-17T12:56:16Z

This seems like an accelerate feature request and not a diffusers feature request.

Cc: @williamberman

Yes, I submit it in the wrong repository. Could you please help me move it to the Accelerate repository? I don't have the necessary permissions.

sayakpaul · 2023-04-17T13:13:02Z

Feel free to open one there: github.com/huggingface/accelerate. Closing it here.

uygnef mentioned this issue Apr 14, 2023

[deepspeed] partial ZeRO-3 support #3076

Merged

sayakpaul closed this as completed Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support assigning different optimizers with weights to multiple models during deepspeed.initialize in Accelerate #3098

Support assigning different optimizers with weights to multiple models during deepspeed.initialize in Accelerate #3098

uygnef commented Apr 14, 2023 •

edited

Loading

sayakpaul commented Apr 17, 2023

uygnef commented Apr 17, 2023

sayakpaul commented Apr 17, 2023

Support assigning different optimizers with weights to multiple models during deepspeed.initialize in Accelerate #3098

Support assigning different optimizers with weights to multiple models during deepspeed.initialize in Accelerate #3098

Comments

uygnef commented Apr 14, 2023 • edited Loading

sayakpaul commented Apr 17, 2023

uygnef commented Apr 17, 2023

sayakpaul commented Apr 17, 2023

uygnef commented Apr 14, 2023 •

edited

Loading