Skip to content

Commit 3b641ea

Browse files
authored
feat: verfication of multi-gpu support for select examples. (#3126)
* feat: verfication of multi-gpu support for select examples. * add: multi-gpu training sections to the relvant doc pages.
1 parent 703307e commit 3b641ea

File tree

9 files changed

+182
-3
lines changed

9 files changed

+182
-3
lines changed

docs/source/en/training/controlnet.mdx

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,29 @@ accelerate launch train_controlnet.py \
113113
--gradient_accumulation_steps=4
114114
```
115115

116+
## Training with multiple GPUs
117+
118+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
119+
for running distributed training with `accelerate`. Here is an example command:
120+
121+
```bash
122+
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
123+
export OUTPUT_DIR="path to save model"
124+
125+
accelerate launch --mixed_precision="fp16" --multi_gpu train_controlnet.py \
126+
--pretrained_model_name_or_path=$MODEL_DIR \
127+
--output_dir=$OUTPUT_DIR \
128+
--dataset_name=fusing/fill50k \
129+
--resolution=512 \
130+
--learning_rate=1e-5 \
131+
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
132+
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
133+
--train_batch_size=4 \
134+
--mixed_precision="fp16" \
135+
--tracker_project_name="controlnet-demo" \
136+
--report_to=wandb
137+
```
138+
116139
## Example results
117140

118141
#### After 300 steps with batch size 8

docs/source/en/training/instructpix2pix.mdx

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,27 @@ accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
126126

127127
***Note: In the original paper, the authors observed that even when the model is trained with an image resolution of 256x256, it generalizes well to bigger resolutions such as 512x512. This is likely because of the larger dataset they used during training.***
128128

129+
## Training with multiple GPUs
130+
131+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
132+
for running distributed training with `accelerate`. Here is an example command:
133+
134+
```bash
135+
accelerate launch --mixed_precision="fp16" --multi_gpu train_instruct_pix2pix.py \
136+
--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 \
137+
--dataset_name=sayakpaul/instructpix2pix-1000-samples \
138+
--use_ema \
139+
--enable_xformers_memory_efficient_attention \
140+
--resolution=512 --random_flip \
141+
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
142+
--max_train_steps=15000 \
143+
--checkpointing_steps=5000 --checkpoints_total_limit=1 \
144+
--learning_rate=5e-05 --lr_warmup_steps=0 \
145+
--conditioning_dropout_prob=0.05 \
146+
--mixed_precision=fp16 \
147+
--seed=42
148+
```
149+
129150
## Inference
130151

131152
Once training is complete, we can perform inference:

docs/source/en/training/text2image.mdx

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,31 @@ accelerate launch train_text_to_image.py \
106106
--lr_scheduler="constant" --lr_warmup_steps=0 \
107107
--output_dir=${OUTPUT_DIR}
108108
```
109+
110+
#### Training with multiple GPUs
111+
112+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
113+
for running distributed training with `accelerate`. Here is an example command:
114+
115+
```bash
116+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
117+
export dataset_name="lambdalabs/pokemon-blip-captions"
118+
119+
accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \
120+
--pretrained_model_name_or_path=$MODEL_NAME \
121+
--dataset_name=$dataset_name \
122+
--use_ema \
123+
--resolution=512 --center_crop --random_flip \
124+
--train_batch_size=1 \
125+
--gradient_accumulation_steps=4 \
126+
--gradient_checkpointing \
127+
--max_train_steps=15000 \
128+
--learning_rate=1e-05 \
129+
--max_grad_norm=1 \
130+
--lr_scheduler="constant" --lr_warmup_steps=0 \
131+
--output_dir="sd-pokemon-model"
132+
```
133+
109134
</pt>
110135
<jax>
111136
With Flax, it's possible to train a Stable Diffusion model faster on TPUs and GPUs thanks to [@duongna211](https://github.com/duongna21). This is very efficient on TPU hardware but works great on GPUs too. The Flax training script doesn't support features like gradient checkpointing or gradient accumulation yet, so you'll need a GPU with at least 30GB of memory or a TPU v3.

docs/source/en/training/unconditional_training.mdx

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,26 @@ accelerate launch train_unconditional.py \
122122
<img src="https://user-images.githubusercontent.com/26864830/180248200-928953b4-db38-48db-b0c6-8b740fe6786f.png"/>
123123
</div>
124124

125+
### Training with multiple GPUs
126+
127+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
128+
for running distributed training with `accelerate`. Here is an example command:
129+
130+
```bash
131+
accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
132+
--dataset_name="huggan/pokemon" \
133+
--resolution=64 --center_crop --random_flip \
134+
--output_dir="ddpm-ema-pokemon-64" \
135+
--train_batch_size=16 \
136+
--num_epochs=100 \
137+
--gradient_accumulation_steps=1 \
138+
--use_ema \
139+
--learning_rate=1e-4 \
140+
--lr_warmup_steps=500 \
141+
--mixed_precision="fp16" \
142+
--logger="wandb"
143+
```
144+
125145
## Finetuning with your own data
126146

127147
There are two ways to finetune a model on your own dataset:

examples/controlnet/README.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,29 @@ accelerate launch train_controlnet.py \
9696
--gradient_accumulation_steps=4
9797
```
9898

99+
## Training with multiple GPUs
100+
101+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
102+
for running distributed training with `accelerate`. Here is an example command:
103+
104+
```bash
105+
export MODEL_DIR="runwayml/stable-diffusion-v1-5"
106+
export OUTPUT_DIR="path to save model"
107+
108+
accelerate launch --mixed_precision="fp16" --multi_gpu train_controlnet.py \
109+
--pretrained_model_name_or_path=$MODEL_DIR \
110+
--output_dir=$OUTPUT_DIR \
111+
--dataset_name=fusing/fill50k \
112+
--resolution=512 \
113+
--learning_rate=1e-5 \
114+
--validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \
115+
--validation_prompt "red circle with blue background" "cyan circle with brown floral background" \
116+
--train_batch_size=4 \
117+
--mixed_precision="fp16" \
118+
--tracker_project_name="controlnet-demo" \
119+
--report_to=wandb
120+
```
121+
99122
## Example results
100123

101124
#### After 300 steps with batch size 8

examples/instruct_pix2pix/README.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,27 @@ accelerate launch --mixed_precision="fp16" train_instruct_pix2pix.py \
113113

114114
***Note: In the original paper, the authors observed that even when the model is trained with an image resolution of 256x256, it generalizes well to bigger resolutions such as 512x512. This is likely because of the larger dataset they used during training.***
115115

116+
## Training with multiple GPUs
117+
118+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
119+
for running distributed training with `accelerate`. Here is an example command:
120+
121+
```bash
122+
accelerate launch --mixed_precision="fp16" --multi_gpu train_instruct_pix2pix.py \
123+
--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5 \
124+
--dataset_name=sayakpaul/instructpix2pix-1000-samples \
125+
--use_ema \
126+
--enable_xformers_memory_efficient_attention \
127+
--resolution=512 --random_flip \
128+
--train_batch_size=4 --gradient_accumulation_steps=4 --gradient_checkpointing \
129+
--max_train_steps=15000 \
130+
--checkpointing_steps=5000 --checkpoints_total_limit=1 \
131+
--learning_rate=5e-05 --lr_warmup_steps=0 \
132+
--conditioning_dropout_prob=0.05 \
133+
--mixed_precision=fp16 \
134+
--seed=42
135+
```
136+
116137
## Inference
117138

118139
Once training is complete, we can perform inference:

examples/text_to_image/README.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,31 @@ image = pipe(prompt="yoda").images[0]
111111
image.save("yoda-pokemon.png")
112112
```
113113

114+
#### Training with multiple GPUs
115+
116+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
117+
for running distributed training with `accelerate`. Here is an example command:
118+
119+
```bash
120+
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
121+
export dataset_name="lambdalabs/pokemon-blip-captions"
122+
123+
accelerate launch --mixed_precision="fp16" --multi_gpu train_text_to_image.py \
124+
--pretrained_model_name_or_path=$MODEL_NAME \
125+
--dataset_name=$dataset_name \
126+
--use_ema \
127+
--resolution=512 --center_crop --random_flip \
128+
--train_batch_size=1 \
129+
--gradient_accumulation_steps=4 \
130+
--gradient_checkpointing \
131+
--max_train_steps=15000 \
132+
--learning_rate=1e-05 \
133+
--max_grad_norm=1 \
134+
--lr_scheduler="constant" --lr_warmup_steps=0 \
135+
--output_dir="sd-pokemon-model"
136+
```
137+
138+
114139
#### Training with Min-SNR weighting
115140

116141
We support training with the Min-SNR weighting strategy proposed in [Efficient Diffusion Training via Min-SNR Weighting Strategy](https://arxiv.org/abs/2303.09556) which helps to achieve faster convergence

examples/text_to_image/train_text_to_image.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,8 @@ def log_validation(vae, text_encoder, tokenizer, unet, args, accelerator, weight
6464

6565
pipeline = StableDiffusionPipeline.from_pretrained(
6666
args.pretrained_model_name_or_path,
67-
vae=vae,
68-
text_encoder=text_encoder,
67+
vae=accelerator.unwrap_model(vae),
68+
text_encoder=accelerator.unwrap_model(text_encoder),
6969
tokenizer=tokenizer,
7070
unet=accelerator.unwrap_model(unet),
7171
safety_checker=None,

examples/unconditional_image_generation/README.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Training examples
1+
## Training an unconditional diffusion model
22

33
Creating a training image set is [described in a different document](https://huggingface.co/docs/datasets/image_process#image-datasets).
44

@@ -76,6 +76,27 @@ A full training run takes 2 hours on 4xV100 GPUs.
7676

7777
<img src="https://user-images.githubusercontent.com/26864830/180248200-928953b4-db38-48db-b0c6-8b740fe6786f.png" width="700" />
7878

79+
### Training with multiple GPUs
80+
81+
`accelerate` allows for seamless multi-GPU training. Follow the instructions [here](https://huggingface.co/docs/accelerate/basic_tutorials/launch)
82+
for running distributed training with `accelerate`. Here is an example command:
83+
84+
```bash
85+
accelerate launch --mixed_precision="fp16" --multi_gpu train_unconditional.py \
86+
--dataset_name="huggan/pokemon" \
87+
--resolution=64 --center_crop --random_flip \
88+
--output_dir="ddpm-ema-pokemon-64" \
89+
--train_batch_size=16 \
90+
--num_epochs=100 \
91+
--gradient_accumulation_steps=1 \
92+
--use_ema \
93+
--learning_rate=1e-4 \
94+
--lr_warmup_steps=500 \
95+
--mixed_precision="fp16" \
96+
--logger="wandb"
97+
```
98+
99+
To be able to use Weights and Biases (`wandb`) as a logger you need to install the library: `pip install wandb`.
79100

80101
### Using your own data
81102

0 commit comments

Comments
 (0)