Closed
Description
Describe the bug
Discussed in #3437 (comment) . It appears that memory usage increases significantly when using LoRA, even in environments using xFormers. I investigated the cause using this script. The results suggest that even in environments where xFormers is enabled, effectively resulting in the same situation as if xFormers had been deactivated.
As a solution, it seems good to use LoRAXFormersAttnProcessor
instead of LoRAAttnProcessor
if xFormers is enabled in this part.
diffusers/src/diffusers/loaders.py
Lines 275 to 286 in a94977b
What do you think?
Reproduction
https://gist.github.com/takuma104/e2139bda7f74cd977350e18500156683
Logs
{"width": 512, "height": 512, "batch": 1, "xformers": "OFF", "lora": "OFF", "mem_MB": 3837}
{"width": 512, "height": 512, "batch": 1, "xformers": "OFF", "lora": "ON", "mem_MB": 3837}
{"width": 512, "height": 768, "batch": 1, "xformers": "OFF", "lora": "OFF", "mem_MB": 5878}
{"width": 512, "height": 768, "batch": 1, "xformers": "OFF", "lora": "ON", "mem_MB": 5880}
{"width": 512, "height": 512, "batch": 2, "xformers": "OFF", "lora": "OFF", "mem_MB": 5505}
{"width": 512, "height": 512, "batch": 2, "xformers": "OFF", "lora": "ON", "mem_MB": 5507}
{"width": 512, "height": 768, "batch": 2, "xformers": "OFF", "lora": "OFF", "mem_MB": 9589}
{"width": 512, "height": 768, "batch": 2, "xformers": "OFF", "lora": "ON", "mem_MB": 9591}
{"width": 512, "height": 512, "batch": 4, "xformers": "OFF", "lora": "OFF", "mem_MB": 8842}
{"width": 512, "height": 512, "batch": 4, "xformers": "OFF", "lora": "ON", "mem_MB": 8844}
{"width": 512, "height": 768, "batch": 4, "xformers": "OFF", "lora": "OFF", "mem_MB": 17011}
{"width": 512, "height": 768, "batch": 4, "xformers": "OFF", "lora": "ON", "mem_MB": 17013}
{"width": 512, "height": 512, "batch": 1, "xformers": "ON", "lora": "OFF", "mem_MB": 2806}
{"width": 512, "height": 512, "batch": 1, "xformers": "ON", "lora": "ON", "mem_MB": 3837}
{"width": 512, "height": 768, "batch": 1, "xformers": "ON", "lora": "OFF", "mem_MB": 3125}
{"width": 512, "height": 768, "batch": 1, "xformers": "ON", "lora": "ON", "mem_MB": 5880}
{"width": 512, "height": 512, "batch": 2, "xformers": "ON", "lora": "OFF", "mem_MB": 3243}
{"width": 512, "height": 512, "batch": 2, "xformers": "ON", "lora": "ON", "mem_MB": 5507}
{"width": 512, "height": 768, "batch": 2, "xformers": "ON", "lora": "OFF", "mem_MB": 3780}
{"width": 512, "height": 768, "batch": 2, "xformers": "ON", "lora": "ON", "mem_MB": 9591}
{"width": 512, "height": 512, "batch": 4, "xformers": "ON", "lora": "OFF", "mem_MB": 4317}
{"width": 512, "height": 512, "batch": 4, "xformers": "ON", "lora": "ON", "mem_MB": 8844}
{"width": 512, "height": 768, "batch": 4, "xformers": "ON", "lora": "OFF", "mem_MB": 5392}
{"width": 512, "height": 768, "batch": 4, "xformers": "ON", "lora": "ON", "mem_MB": 17013}
System Info
diffusers
version: 0.16.1- Platform: Linux-5.19.0-41-generic-x86_64-with-glibc2.35
- Python version: 3.10.11
- PyTorch version (GPU RTX3090): 2.0.1+cu117 (True)
- Huggingface_hub version: 0.14.1
- Transformers version: 4.29.1
- Accelerate version: 0.19.0
- xFormers version: 0.0.20
- Using GPU in script?: True
- Using distributed or parallel set-up in script?: Nope