-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Support FLUX nf4 & pf8 for GPUs with 6GB/8GB VRAM (method and checkpoints by lllyasviel) #9149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think if NF4 is much better than FP8, maybe we can make it usable for all models. (Including SDXL) |
@chuck-ma NF4 is better in some cases and worse in other cases compared to FP8 (hard to tell when though). NF4 is essentially FP4 (4 bits per weight) with some additional data/additional changes to calibrate it closer to the original model, compared to FP4. Here, that's done by mixing precision (e.g. using higher precision bf16 where it matters and lower precision int4 where it matters a lot less). FP8 is essentially casting down to 8 bits per weight without any sort of calibration, so it's faster to make but larger and different from NF4. In addition, there's no real reason other models shouldn't support this since it doesn't rely on any flux specific quirk (flux just motivated the development of quantization techniques like this for image models) |
It seems I can't get it work with diffusers, do you have any very simple example code to make it works? Many thanks. |
@Swarzox This is an issue, not a PR. The code to make this work in diffusers has not been contributed or created yet. |
Both NF4 and llm.int8 can be done with some code changes ad-hoc: Serialization and direct loading support will be done through the plan proposed in #9174. Directly loading the said checkpoint can lead to some problematic results because of the reasons explained in #9165 (comment). If you want to obtain the text encoders and VAE from that checkpoint, you can use the snippet from #9165 (comment) and then use something like #9177 so that computations run in a higher-precision data-type while the params are kept in lower-precision data-type such as FP8. You can also do a direct llm.int8() or NF4 style loading of the bulky T5-xxl and use it within a diffusers pipeline. See: https://gist.github.com/sayakpaul/82acb5976509851f2db1a83456e504f1 There are many options to run things in a memory-efficient way. So, with a programmatic approach, we let you choose what's best for you :) |
how to use with diffusrrs? any code |
I can't be sure of what you are doing, but I get 9 GiB with CPU offload and ~18 GiB without This may also get even better when #9174 is complete/has a working demo |
I am very much looking forward to it, hoping for an extremely simple implementation with just a few lines of code. |
Hey @Ednaordinary, I saw your comment on the other thread about the previous error you were encountering and wanted to know how/if you resolved it. I am talking about the 'Error in FluxImageGenerator initialization: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.' I am stuck on the same problem right now. Thank you! |
NF4 is removed from comfyui latest update |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
The flexibility of To illustrate my point, consider the Quantization strategies like llm.int8() or NF4 or the ones provided in https://x.com/RisingSayak/status/1836679359521820704 gives a visual of how this might look like. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
lllyasviel/stable-diffusion-webui-forge#981
The text was updated successfully, but these errors were encountered: