[LoRA] Discussions on ensuring robust LoRA support in Diffusers #3620

sayakpaul · 2023-05-31T02:36:36Z

For the last few months, we have been collaborating with our contributors to ensure we support LoRA effectively and efficiently from Diffusers:

1. Training support

✅ DreamBooth (letting users perform LoRA fine-tuning of both UNet and text-encoder). There were some issues in the text encoder part which are now being fixed in #3437. Thanks to @takuma104.
✅ Vanilla text-to-image fine-tuning. We support only the fine-tuning of UNet with LoRA purposefully since here we'd assume that the number of image-caption pairs is higher than what is typically used for DreamBooth and therefore, text encoder fine-tuning is probably an overkill.

2. Interoperability

With #3437, we're introducing limited support for loading A1111 CivitAI checkpoints with pipeline.load_lora_weights(). This has been a widely requested feature (see #3064 as an example).

We do provide a convert_lora_safetensor_to_diffusers.py script as well that allows for converting A1111 LoRA checkpoints (potentially non-exhaustive) and merging them to the text encoder and the UNet of a DiffusionPipeline. However, this doesn't allow switching the attention processor back to the default one, unlike how it's currently in Diffusers. Check out https://huggingface.co/docs/diffusers/main/en/training/lora for more details. For inference-only and definitive workflows (where one doesn't need to switch attention processors), it caters to many use cases.

3. xformers support for efficient inference

Once LoRA parameters are loaded into a pipeline, xformers should work seamlessly. There was apparently a problem with that and it's fixed in #3556.

4. PT 2.0 SDPA optimization

See: #3594

5. torch.compile() compatibility with LoRA

Once 4. is settled, we should be able to take advantage of torch.compile().

6. Introduction of scale for control the contributions from the text encoder LoRA

See #3480. We already support passing scale as a part of cross_attention_kwargs for the UNet LoRA.

7. Supporting multiple LoRAs

@takuma104 proposed a hook-based design here: #3064 (comment)

I hope this helps to provide a consolidated view of where we're at regarding supporting LoRA from Diffusers.

Cc: @pcuenca @patrickvonplaten

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2023-05-31T13:21:55Z

Thanks a lot for the great summary!

I agree with all the points except for 7. where it'd like to wait a bit since it don't (yet) see the importance of having multiple LoRAs loaded into the model at once. Let me open a quick draft PR for 6. and link it to #3480

sayakpaul · 2023-06-01T03:45:10Z

I agree with all the points except for 7. where it'd like to wait a bit since it don't (yet) see the importance of having multiple LoRAs loaded into the model at once.

Same.

bghira · 2023-06-01T15:54:45Z

i have been experimenting with SD 2.1 fine-tuning and my results show that tuning the text encoder is pretty important but also dangerous and easily pushed over some kind of numeric cliff of sorts into the "catastrophic forgetting" territory.

i have mostly started freezing all but the last 4 to 7 layers of OpenCLIP during fine-tuning, and this is where i've supplied about 30,000 image-caption pairs of high quality images and captions, done by hand, by human volunteers.

the learning rate is very important for the text encoder. far more than the unet. and the ones inside the current Diffusers code for get_scheduler hardcode a lr_end of 1e-7 which is too high for the consumer systems this training is typically done on. furthermore, the lr_scale option should likely be tuned and enabled by default.

we need to wrap the optimizer in a class that allows emulation of a single target, but using two learning rate schedulers internally, so that we can train the text encoder more slowly than the unet.

my current workarounds are, freezing the TE, stopping training of it about 25% through the final run (25k steps out of 100k steps), and, providing a large amount of "balance" images that are the highest quality human photos out of the dataset, fed at a 20% rate compared to the "real training data". this ensured the least amount of forgetting at the end, with the highest quality results.

additionally, i've added the patched betas scheduler into my training scripts @ bghira/SimpleTuner which i have derived from these examples. this enabled "enforced terminal SNR" which drastically improved the perceived quality of the outputs.

sayakpaul · 2023-06-02T08:02:10Z

Thanks for sharing these insights! Feel free also share a link to your repo and some visual results.

jelling · 2023-06-05T19:18:12Z

Thanks a lot for the great summary!

I agree with all the points except for 7. where it'd like to wait a bit since it don't (yet) see the importance of having multiple LoRAs loaded into the model at once. Let me open a quick draft PR for 6. and link it to #3480

We are training custom LoRAs for different characters in a story. So while many LoRA users will just be applying a global style, this is super important for us.

patrickvonplaten · 2023-06-07T15:32:22Z

Thanks for the answer @jelling, so I guess the important part is to be able to quickly set/unset different LoRA no?
The load_lora_weights function already supports loading / setting dictionaries:

diffusers/src/diffusers/loaders.py

Line 781 in cd9d091

    
           def load_lora_weights(self, pretrained_model_name_or_path_or_dict: Union[str, Dict[str, torch.Tensor]], **kwargs):

Would the following solution be ok for your case @jelling:

from diffusers import DiffusionPipeline
from diffusers.utils import _get_model_file

lora_repo_ids = [ ] # list all LoRA repo ids here

pipe = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)

def load_state_dict(repo_id):
    model_file = _get_model_file(repo_id, "pytorch_lora_weights.bin")
    state_dict = torch.load(model_file, map_location="cpu")
    return state_dict

# make sure all LoRA weights are in RAM
lora_state_dicts = {k: load_state_dict(k) for k in lora_repo_ids}

pipe.load_lora_weights(lora_state_dict["<first-character>"])

pipe(...)

pipe.load_lora_weights(lora_state_dict["<second-character>"])

=> this way should be pretty efficient. Would this work for you?

frankjoshua · 2023-06-07T23:47:38Z

I want to advocate for step 7. Many people are mixing Loras in A1111 with very interesting results. It would be nice to have many loaded at one time and pass in weights during inference. You could set weights to zero for Loras that you did not want activated.

jelling · 2023-06-08T00:10:46Z

I want to advocate for step 7. Many people are mixing Loras in A1111 with very interesting results. It would be nice to have many loaded at one time and pass in weights during inference. You could set weights to zero for Loras that you did not want activated.

@patrickvonplaten this is what I'm trying to accomplish but with diffusers. In your example above, it looks like you are showing how to quickly switch between LoRAs. This is a good feature - and one I was curious about - but we need to run multiple LoRAs at the same time on the same inference. I.e. two character LoRAs would be used to generate a single image.

sayakpaul · 2023-06-08T09:48:45Z

@patrickvonplaten this is what I'm trying to accomplish but with diffusers. In your example above, it looks like you are showing how to quickly switch between LoRAs. This is a good feature - and one I was curious about - but we need to run multiple LoRAs at the same time on the same inference. I.e. two character LoRAs would be used to generate a single image.

This is something we're actively watching. Upon sufficient request, we'll start brainstorming about it or might even rely on peft. Stay tuned :)

lionel-alves · 2023-06-08T16:18:20Z

Congrats for supporting A1111 LoRA format 👏
I also support the request to have multiple LoRA on a given inference, if you look at Civitai, you will see that this is very common. Some LoRA like add_detail are broadly used in combination with a character for example.

jelling · 2023-06-08T18:27:21Z

This is something we're actively watching. Upon sufficient request, we'll start brainstorming about it or might even rely on peft. Stay tuned :)

@sayakpaul could you tell me anything about what's entailed in adding support? It's important enough for us that we might try adding support ourselves, if it comes to do it. I haven't gone through the multi-LoRA section of automatic1111 yet, but is there something about how they do it that incompatible with the general diffusers way of doing this?

sayakpaul · 2023-06-09T04:12:46Z

I think the main bottleneck is around the design i.e., IIUC, they merge the LoRA weights into the UNet. This is not how we do it in Diffusers. We make use of specific attention processor classes so that we can unload a LoRA and carry one.

With the merging weights' design, it's relatively simpler but with our attention processor design, we need to be careful.

frankjoshua · 2023-06-10T03:24:41Z

Here is some pseudo code representing they way I wished LoRAs would work. Just an idea. I thought code would be better for explaining my thoughts.

pipe_A = StableDiffusionPipeline.from_pretrained(
                pretrained_model_name_or_path = "/path/to/cool/civitai/modelA.safetensors", 
                torch_dtype torch.float16
            )

pipe_B = StableDiffusionPipeline.from_pretrained(
                pretrained_model_name_or_path = "/path/to/cool/civitai/modelB.safetensors, 
                torch_dtype torch.float16
            )

lora_A = LoRA.from_pretrained(
            pretrained_model_name_or_path = "/path/to/cool/civitai/LoraModelA.safetensors"
        )

lora_B = LoRA.from_pretrained(
            pretrained_model_name_or_path = "/path/to/cool/civitai/LoraModelB.safetensors"
        )

image_A = pipe_A( 
            prompt = "First idea of image",
            loras = [
                {
                    "lora": lora_A,
                    "weight": 0.7
                },
                {
                    "lora": lora_B,
                    "weight": 0.1
                }
            ])[0]

image_B = pipe_B( 
            prompt = "Second idea of image",
            loras = [
                {
                    "lora": lora_A,
                    "weight": 0.2
                }
            ])[0]

image_C = pipe_B( 
            prompt = "Third idea of image",
            loras = None )[0]

In my imagination they are also very fast. Slowing down inference by at most 10%.

sayakpaul · 2023-06-11T07:03:24Z

This should be possible with peft. We are internally brainstorming about it. Will keep our community posted about that.

sayakpaul · 2023-07-11T06:17:33Z

Closing this since we introspected quite a bit.

Thread on supporting multiple LoRAs will be a different one.

frankjoshua · 2023-07-11T19:55:36Z

What issue should we watch? Is there a different thread currently regarding multiple LoRA support?

sayakpaul · 2023-07-12T02:43:44Z

Not yet. Will begin soon.

frankjoshua mentioned this issue Jun 10, 2023

[LORA] How to unload lora weight after load_lora_weights #3689

Closed

sayakpaul closed this as completed Jul 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoRA] Discussions on ensuring robust LoRA support in Diffusers #3620

[LoRA] Discussions on ensuring robust LoRA support in Diffusers #3620

sayakpaul commented May 31, 2023

patrickvonplaten commented May 31, 2023

sayakpaul commented Jun 1, 2023

bghira commented Jun 1, 2023

sayakpaul commented Jun 2, 2023 •

edited

Loading

jelling commented Jun 5, 2023

patrickvonplaten commented Jun 7, 2023

frankjoshua commented Jun 7, 2023

jelling commented Jun 8, 2023

sayakpaul commented Jun 8, 2023

lionel-alves commented Jun 8, 2023

jelling commented Jun 8, 2023

sayakpaul commented Jun 9, 2023

frankjoshua commented Jun 10, 2023 •

edited

Loading

sayakpaul commented Jun 11, 2023

sayakpaul commented Jul 11, 2023

frankjoshua commented Jul 11, 2023

sayakpaul commented Jul 12, 2023

[LoRA] Discussions on ensuring robust LoRA support in Diffusers #3620

[LoRA] Discussions on ensuring robust LoRA support in Diffusers #3620

Comments

sayakpaul commented May 31, 2023

patrickvonplaten commented May 31, 2023

sayakpaul commented Jun 1, 2023

bghira commented Jun 1, 2023

sayakpaul commented Jun 2, 2023 • edited Loading

jelling commented Jun 5, 2023

patrickvonplaten commented Jun 7, 2023

frankjoshua commented Jun 7, 2023

jelling commented Jun 8, 2023

sayakpaul commented Jun 8, 2023

lionel-alves commented Jun 8, 2023

jelling commented Jun 8, 2023

sayakpaul commented Jun 9, 2023

frankjoshua commented Jun 10, 2023 • edited Loading

sayakpaul commented Jun 11, 2023

sayakpaul commented Jul 11, 2023

frankjoshua commented Jul 11, 2023

sayakpaul commented Jul 12, 2023

sayakpaul commented Jun 2, 2023 •

edited

Loading

frankjoshua commented Jun 10, 2023 •

edited

Loading