Skip to content

Commit b4b0347

Browse files
sayakpaulpatrickvonplaten
authored andcommitted
[SDXL DreamBooth LoRA] add support for text encoder fine-tuning (huggingface#4097)
* Allow low precision sd xl * finish * finish * feat: initial draft for supporting text encoder lora finetuning for SDXL DreamBooth * fix: variable assignments. * add: autocast block. * add debugging * vae dtype hell * fix: vae dtype hell. * fix: vae dtype hell 3. * clean up * lora text encoder loader. * fix: unwrapping models. * add: tests. * docs. * handle unexpected keys. * fix vae dtype in the final inference. * fix scope problem. * fix: save_model_card args. * initialize: prefix to None. * fix: dtype issues. * apply gixes. * debgging. * debugging * debugging * debugging * debugging * debugging * add: fast tests. * pre-tokenize. * address: will's comments. * fix: loader and tests. * fix: dataloader. * simplify dataloader. * length. * simplification. * make style && make quality * simplify state_dict munging * fix: tests. * fix: state_dict packing. * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> --------- Co-authored-by: Patrick von Platen <[email protected]>
1 parent c680249 commit b4b0347

File tree

6 files changed

+568
-173
lines changed

6 files changed

+568
-173
lines changed

examples/dreambooth/README_sdxl.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,17 @@ Here's a side-by-side comparison of the with and without Refiner pipeline output
164164
|---|---|
165165
| ![](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/sd_xl/sks_dog.png) | ![](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/sd_xl/refined_sks_dog.png) |
166166

167+
### Training with text encoder(s)
168+
169+
Alongside the UNet, LoRA fine-tuning of the text encoders is also supported. To do so, just specify `--train_text_encoder` while launching training. Please keep the following points in mind:
170+
171+
* SDXL has two text encoders. So, we fine-tune both using LoRA.
172+
* When not fine-tuning the text encoders, we ALWAYS precompute the text embeddings to save memory.
173+
174+
### Specifying a better VAE
175+
176+
SDXL's VAE is known to suffer from numerical instability issues. This is why we also expose a CLI argument namely `--pretrained_vae_model_name_or_path` that lets you specify the location of a better VAE (such as [this one](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix)).
177+
167178
## Notes
168179

169180
In our experiments we found that SDXL yields very good initial results using the default settings of the script. We didn't explore further hyper-parameter tuning experiments, but we do encourage the community to explore this avenue further and share their results with us 🤗

0 commit comments

Comments
 (0)