Skip to content

Support assigning different optimizers with weights to multiple models during deepspeed.initialize in Accelerate #3098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
uygnef opened this issue Apr 14, 2023 · 3 comments

Comments

@uygnef
Copy link

uygnef commented Apr 14, 2023

Is your feature request related to a problem? Please describe.

Currently, training multiple models with Deepspeed's ZeRO optimization does not work as intended. A detailed issue has been reported and can be found here: huggingface/transformers#22705

sp
Describe the solution you'd like
A clear and concise description of what you want to happen.
Thanks for @stas00 helping : #3076

When Deepspeed ZeRO-3 is used in examples/text_to_image/train_text_to_image.py the 3 models (text_encoder, vae, unet) get partitioned with zero.Init but only unet gets accelerate.prepare so Deepspeed doesn't know to automatically gather the already partitioned weights before each forward call for text_encoder and vae. That's the source of the problem.

Currently Accelerate doesn't know how to handle multiple models.

For this to work properly all models must be run through accelerate.prepare. But Accelerate will try to assign the same optimizer with the same weights to all models during deepspeed.initialize, which of course doesn't work. It needs to assign the correct weights to the correct optimizer, so that the param_groups are populated with the weights that are inside the model itself. In the case of frozen models there should be no optimizer at all (or param_groups should be empty).

I would like to see Accelerate support training multiple models by preparing all models and assigning different weights and optimizers for each model. This will enable efficient training of multiple models and help address issues such as the one discussed in pull request #3076 .
In summary, Accelerate needs to be able to prepare all models and assign different weights and optimizers for efficient training of multiple models.

Describe alternatives you've considered
if the proposed enhancement in Accelerate to support training of multiple models is implemented, it could improve the training speed and efficiency of models like ChatLLM. Further optimizing it with DeepSpeed could lead to even better results in terms of training speed and memory usage.

@sayakpaul
Copy link
Member

This seems like an accelerate feature request and not a diffusers feature request.

Cc: @williamberman

@uygnef
Copy link
Author

uygnef commented Apr 17, 2023

This seems like an accelerate feature request and not a diffusers feature request.

Cc: @williamberman

Yes, I submit it in the wrong repository. Could you please help me move it to the Accelerate repository? I don't have the necessary permissions.

@sayakpaul
Copy link
Member

Feel free to open one there: github.com/huggingface/accelerate. Closing it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants