-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[enhancement]: Graphcore IPU Support #2120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
(Note that I would make a crude attempt to hack this in myself and test it, but it looks like the diffusers implementation isn't done yet). |
Their pipeline code is here: https://github.com/gradient-ai/Graphcore-HuggingFace/blob/main/stable-diffusion/ipu_models.py Seems totally plausible. I just don't know how to do it cleanly yet. So much of this stuff is very subclass-happy -- including the work-in-progress pipeline in #1583 -- and that's not the best for composition. They also monkey with the cross-attention code which might conflict with some of Invoke's own desire to monkey with cross-attention code. If you want to work on this, I think a good next step would be to take a look at this new diffusers API for cross-attention: https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/cross_attention.py See if you can use that to replace graphcore's current |
Hmmm, looks like the function descriptions are accurate: https://github.com/gradient-ai/Graphcore-HuggingFace/blob/main/stable-diffusion/ipu_models.py#L36
I can't find it at the moment, but I read on some Graphcore blog post that torch.baddbmm is not supported on CPU. Replacing the attention OR sliced attention override with the diffusers versions always executes this, which trips it up: Cross Attention Error
Not sure if there is an easy way to work around that. |
I worked with the mentioned code a few weeks ago (on optimum, but same attention overwrite). The code is for older attention (before diffuser refactored much of the attention code). It should be possible to get it to work without too much hassle, since it's just replacing attention with simpler/less efficient code. Though, there is another major problem that require solving. You will need to compile the model and cache the complied model somewhere, for every possible resolution. The first time the model is run, it will compile itself. This process takes around 15 minute on the free pod16 machine offered a while ago. After it's compiled, feeding it a different resolution will trigger an error. The only way to avoid that error is to delete the model and recompile. The recompile can be skipped if you load a compiled model. So unless you want the user to wait 15 minute for an image each time they pick a different image resolution, caching all possible resolution is sort of the only option. |
Yes I noticed this on the paperspace Pod4 demo more recently. I also noticed that it only seems to compile on a single thread, and that changing The free Pod4 instance is a 56 thread vm with gobs of RAM, so in some kind of theoretical paperspace notebook, some resolution/model/batch combinations could be selected via the UI at the start and they could be compiled in parallel at startup and stored? But if this is too difficult, I am content with a single resolution/model combination for each existing pipe. On the topic of an InvokeAI paperspace notebook, instead if recreating the UI, I think a workaround like this would allow the user to access the notebook directly: https://nni.readthedocs.io/en/stable/sharings/nni_colab_support.html |
There has been no activity in this issue for 14 days. If this issue is still being experienced, please reply with an updated confirmation that the issue is still being experienced with the latest release. |
Uh oh!
There was an error while loading. Please reload this page.
Is there an existing issue for this?
Contact Details
InvokeAI Discord user of the same name.
What should this feature add?
Hello, since yall are migrating to diffusers anyway (#1583) , would you consider adding Graphcore IPU support as seen in the repo here?
https://www.graphcore.ai/posts/how-to-run-stable-diffusion-inference-on-ipus-with-paperspace
See ipu_models.py in the text-to-image demo, it looks like a fairly simple extension of
StableDiffusionPipeline
I mention this because Paperspace is offering some particularly beefy free IPU instances now. Note that Nvidia claims an RTX 4090 is 165 FP16 tensor tflops:

The text was updated successfully, but these errors were encountered: