-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[Alpha-VLLM Team] Feat: added fused qkv and chunked ffn #8815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
so I think you will actually need to wrap your feedforward layers inside diffusers/src/diffusers/models/attention.py Line 182 in 673eb60
(sorry I think it is our fault, I think the example PR we gave you did not do this either) i see very little memory saving though with this method cc @sayakpaul |
Yeah, it was very expected. It was probably because of parameter-flop patterns. For other models like SDXL, Stable Video, etc., where transformer blocks are used alongside conv blocks, we see quite nice memory savings with chunked feedforward. What is even more interesting is that during my initial studies with the 800M SD3 variant, the savings were evident but they diminished with the 2B variant. @PommesPeter have you played around with the For the QKV fusion, let's maybe wait for #8829 to get merged? Apologies for the fault on our part. |
I haven't tried chunked ffn before. I should play it with different parameters. For qkv fused: |
@PommesPeter hey there :) #8829 is now merged. Would you like to revive this PR again? |
Okay, I will revive this PR. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
closing this since these techniques are not particularly effective here |
What does this PR do?
Fixes #8652
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.