Skip to content

Commit c118491

Browse files
sayakpaulpcuenca
andauthored
[docs] Adds a doc on LoRA support for diffusers (#2086)
* add: a doc on LoRA support in diffusers. * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * apply PR suggestions. * Apply suggestions from code review Co-authored-by: Pedro Cuenca <[email protected]> * remove visually incoherent elements. Co-authored-by: Pedro Cuenca <[email protected]>
1 parent 263b968 commit c118491

File tree

4 files changed

+161
-3
lines changed

4 files changed

+161
-3
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@
7171
title: Dreambooth
7272
- local: training/text2image
7373
title: Text-to-image fine-tuning
74+
- local: training/lora
75+
title: LoRA Support in Diffusers
7476
title: Training
7577
- sections:
7678
- local: conceptual/philosophy

docs/source/en/training/lora.mdx

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# LoRA Support in Diffusers
14+
15+
Diffusers supports LoRA for faster fine-tuning of Stable Diffusion, allowing greater memory efficiency and easier portability.
16+
17+
Low-Rank Adaption of Large Language Models was first introduced by Microsoft in
18+
[LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) by *Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen*.
19+
20+
In a nutshell, LoRA allows adapting pretrained models by adding pairs of rank-decomposition weight matrices (called **update matrices**)
21+
to existing weights and **only** training those newly added weights. This has a couple of advantages:
22+
23+
- Previous pretrained weights are kept frozen so that the model is not so prone to [catastrophic forgetting](https://www.pnas.org/doi/10.1073/pnas.1611835114).
24+
- Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable.
25+
- LoRA matrices are generally added to the attention layers of the original model and they control to which extent the model is adapted toward new training images via a `scale` parameter.
26+
27+
**__Note that the usage of LoRA is not just limited to attention layers. In the original LoRA work, the authors found out that just amending
28+
the attention layers of a language model is sufficient to obtain good downstream performance with great efficiency. This is why, it's common
29+
to just add the LoRA weights to the attention layers of a model.__**
30+
31+
[cloneofsimo](https://github.com/cloneofsimo) was the first to try out LoRA training for Stable Diffusion in the popular [lora](https://github.com/cloneofsimo/lora) GitHub repository.
32+
33+
<Tip>
34+
35+
LoRA allows us to achieve greater memory efficiency since the pretrained weights are kept frozen and only the LoRA weights are trained, thereby
36+
allowing us to run fine-tuning on consumer GPUs like Tesla T4, RTX 3080 or even RTX 2080 Ti! One can get access to GPUs like T4 in the free
37+
tiers of Kaggle Kernels and Google Colab Notebooks.
38+
39+
</Tip>
40+
41+
## Getting started with LoRA for fine-tuning
42+
43+
Stable Diffusion can be fine-tuned in different ways:
44+
45+
* [Textual inversion](https://huggingface.co/docs/diffusers/main/en/training/text_inversion)
46+
* [DreamBooth](https://huggingface.co/docs/diffusers/main/en/training/dreambooth)
47+
* [Text2Image fine-tuning](https://huggingface.co/docs/diffusers/main/en/training/text2image)
48+
49+
We provide two end-to-end examples that show how to run fine-tuning with LoRA:
50+
51+
* [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#training-with-low-rank-adaptation-of-large-language-models-lora)
52+
* [Text2Image](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image#training-with-lora)
53+
54+
If you want to perform DreamBooth training with LoRA, for instance, you would run:
55+
56+
```bash
57+
export MODEL_NAME="runwayml/stable-diffusion-v1-5"
58+
export INSTANCE_DIR="path-to-instance-images"
59+
export OUTPUT_DIR="path-to-save-model"
60+
61+
accelerate launch train_dreambooth_lora.py \
62+
--pretrained_model_name_or_path=$MODEL_NAME \
63+
--instance_data_dir=$INSTANCE_DIR \
64+
--output_dir=$OUTPUT_DIR \
65+
--instance_prompt="a photo of sks dog" \
66+
--resolution=512 \
67+
--train_batch_size=1 \
68+
--gradient_accumulation_steps=1 \
69+
--checkpointing_steps=100 \
70+
--learning_rate=1e-4 \
71+
--report_to="wandb" \
72+
--lr_scheduler="constant" \
73+
--lr_warmup_steps=0 \
74+
--max_train_steps=500 \
75+
--validation_prompt="A photo of sks dog in a bucket" \
76+
--validation_epochs=50 \
77+
--seed="0" \
78+
--push_to_hub
79+
```
80+
81+
A similar process can be followed to fully fine-tune Stable Diffusion on a custom dataset using the
82+
`examples/text_to_image/train_text_to_image_lora.py` script.
83+
84+
Refer to the respective examples linked above to learn more.
85+
86+
<Tip>
87+
88+
When using LoRA we can use a much higher learning rate (typically 1e-4 as opposed to ~1e-6) compared to non-LoRA Dreambooth fine-tuning.
89+
90+
</Tip>
91+
92+
But there is no free lunch. For the given dataset and expected generation quality, you'd still need to experiment with
93+
different hyperparameters. Here are some important ones:
94+
95+
* Training time
96+
* Learning rate
97+
* Number of training steps
98+
* Inference time
99+
* Number of steps
100+
* Scheduler type
101+
102+
Additionally, you can follow [this blog](https://huggingface.co/blog/dreambooth) that documents some of our experimental
103+
findings for performing DreamBooth training of Stable Diffusion.
104+
105+
When fine-tuning, the LoRA update matrices are only added to the attention layers. To enable this, we added new weight
106+
loading functionalities. Their details are available [here](https://huggingface.co/docs/diffusers/main/en/api/loaders).
107+
108+
## Inference
109+
110+
Assuming you used the `examples/text_to_image/train_text_to_image_lora.py` to fine-tune Stable Diffusion on the [Pokemon
111+
dataset](https://huggingface.co/datasets/lambdalabs/pokemon-blip-captions), you can perform inference like so:
112+
113+
```py
114+
from diffusers import StableDiffusionPipeline
115+
import torch
116+
117+
model_path = "sayakpaul/sd-model-finetuned-lora-t4"
118+
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
119+
pipe.unet.load_attn_procs(model_path)
120+
pipe.to("cuda")
121+
122+
prompt = "A pokemon with blue eyes."
123+
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
124+
image.save("pokemon.png")
125+
```
126+
127+
Here are some example images you can expect:
128+
129+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pokemon-collage.png"/>
130+
131+
[`sayakpaul/sd-model-finetuned-lora-t4`](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4) contains [LoRA fine-tuned update matrices](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin)
132+
which is only 3 MBs in size. During inference, the pre-trained Stable Diffusion checkpoints are loaded alongside these update
133+
matrices and then they are combined to run inference.
134+
135+
You can use the [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) library to retrieve the base model
136+
from [`sayakpaul/sd-model-finetuned-lora-t4`](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4) like so:
137+
138+
```py
139+
from huggingface_hub.repocard import RepoCard
140+
141+
card = RepoCard.load("sayakpaul/sd-model-finetuned-lora-t4")
142+
base_model = card.data.to_dict()["base_model"]
143+
# 'CompVis/stable-diffusion-v1-4'
144+
```
145+
146+
And then you can use `pipe = StableDiffusionPipeline.from_pretrained(base_model, torch_dtype=torch.float16)`.
147+
148+
This is especially useful when you don't want to hardcode the base model identifier during initializing the `StableDiffusionPipeline`.
149+
150+
Inference for DreamBooth training remains the same. Check
151+
[this section](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#inference-1) for more details.
152+
153+
## Known limitations
154+
155+
* Currently, we only support LoRA for the attention layers of [`UNet2DConditionModel`](https://huggingface.co/docs/diffusers/main/en/api/models#diffusers.UNet2DConditionModel).

docs/source/en/training/overview.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie
3737
- [Text-to-Image Training](./text2image)
3838
- [Text Inversion](./text_inversion)
3939
- [Dreambooth](./dreambooth)
40+
- [LoRA Support](./lora)
4041

4142
If possible, please [install xFormers](../optimization/xformers) for memory efficient attention. This could help make your training faster and less memory intensive.
4243

examples/text_to_image/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -162,9 +162,9 @@ accelerate --mixed_precision="fp16" launch train_text_to_image_lora.py \
162162

163163
The above command will also run inference as fine-tuning progresses and log the results to Weights and Biases.
164164

165-
**___Note: When using LoRA we can use a much higher learning rate compared to non-LoRA fine-tuning. Here we use *1e-4* instead of the usual *1e-5*. Also, by using LoRA, it's possible to run `train_text_to_image_lora.py` in consumer GPUs like T4 or V100.**
165+
**___Note: When using LoRA we can use a much higher learning rate compared to non-LoRA fine-tuning. Here we use *1e-4* instead of the usual *1e-5*. Also, by using LoRA, it's possible to run `train_text_to_image_lora.py` in consumer GPUs like T4 or V100.___**
166166

167-
The final LoRA embedding weights have been uploaded to [sayakpaul/sd-model-finetuned-lora-t4](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4). **___Note: [The final weights](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin) are only 3 MB in size, which is orders of magnitudes smaller than the original model.**
167+
The final LoRA embedding weights have been uploaded to [sayakpaul/sd-model-finetuned-lora-t4](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4). **___Note: [The final weights](https://huggingface.co/sayakpaul/sd-model-finetuned-lora-t4/blob/main/pytorch_lora_weights.bin) are only 3 MB in size, which is orders of magnitudes smaller than the original model.___**
168168

169169
You can check some inference samples that were logged during the course of the fine-tuning process [here](https://wandb.ai/sayakpaul/text2image-fine-tune/runs/q4lc0xsw).
170170

@@ -191,7 +191,7 @@ image.save("pokemon.png")
191191

192192
For faster training on TPUs and GPUs you can leverage the flax training example. Follow the instructions above to get the model and dataset before running the script.
193193

194-
____Note: The flax example don't yet support features like gradient checkpoint, gradient accumulation etc, so to use flax for faster training we will need >30GB cards.___
194+
**___Note: The flax example doesn't yet support features like gradient checkpoint, gradient accumulation etc, so to use flax for faster training we will need >30GB cards or TPU v3.___**
195195

196196

197197
Before running the scripts, make sure to install the library's training dependencies:

0 commit comments

Comments
 (0)