[Alpha-VLLM Team] Feat: added fused qkv and chunked ffn #8815

PommesPeter · 2024-07-09T06:05:33Z

What does this PR do?

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-07-11T00:04:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu · 2024-07-11T04:53:43Z

so I think you will actually need to wrap your feedforward layers inside _chunked_feed_forward in order for it to work

diffusers/src/diffusers/models/attention.py

Line 182 in 673eb60

if self._chunk_size is not None:

(sorry I think it is our fault, I think the example PR we gave you did not do this either)

i see very little memory saving though with this method cc @sayakpaul

sayakpaul · 2024-07-11T05:48:17Z

i see very little memory saving though with this method

Yeah, it was very expected. It was probably because of parameter-flop patterns. For other models like SDXL, Stable Video, etc., where transformer blocks are used alongside conv blocks, we see quite nice memory savings with chunked feedforward. What is even more interesting is that during my initial studies with the 800M SD3 variant, the savings were evident but they diminished with the 2B variant.

@PommesPeter have you played around with the dim and chunk_size parameters of enable_forward_chunking()? Usually, setting dim to 1 is necessary to achieve more memory savings.

For the QKV fusion, let's maybe wait for #8829 to get merged?

Apologies for the fault on our part.

PommesPeter · 2024-07-11T15:27:03Z

i see very little memory saving though with this method

Yeah, it was very expected. It was probably because of parameter-flop patterns. For other models like SDXL, Stable Video, etc., where transformer blocks are used alongside conv blocks, we see quite nice memory savings with chunked feedforward. What is even more interesting is that during my initial studies with the 800M SD3 variant, the savings were evident but they diminished with the 2B variant.

@PommesPeter have you played around with the dim and chunk_size parameters of enable_forward_chunking()? Usually, setting dim to 1 is necessary to achieve more memory savings.

For the QKV fusion, let's maybe wait for #8829 to get merged?

Apologies for the fault on our part.

I haven't tried chunked ffn before. I should play it with different parameters.

For qkv fused:
I would like to wait the #8829 when it has been merged.

sayakpaul · 2024-07-29T04:45:57Z

@PommesPeter hey there :)

#8829 is now merged. Would you like to revive this PR again?

PommesPeter · 2024-07-29T14:54:06Z

@PommesPeter hey there :)

#8829 is now merged. Would you like to revive this PR again?

Okay, I will revive this PR.

github-actions · 2024-09-14T15:07:06Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu · 2024-11-16T21:55:33Z

closing this since these techniques are not particularly effective here

PommesPeter added 2 commits July 9, 2024 14:04

Added fused qkv and chunked ffn

0da8863

Fixed typo

3d6232e

a-r-r-o-w requested review from DN6 and yiyixuxu July 9, 2024 10:13

yiyixuxu requested a review from sayakpaul July 10, 2024 23:57

Merge branch 'main' into lumina-optim

b7d3367

github-actions bot added the stale Issues that haven't received updates label Sep 14, 2024

yiyixuxu closed this Nov 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Alpha-VLLM Team] Feat: added fused qkv and chunked ffn #8815

[Alpha-VLLM Team] Feat: added fused qkv and chunked ffn #8815

PommesPeter commented Jul 9, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 11, 2024

yiyixuxu commented Jul 11, 2024

sayakpaul commented Jul 11, 2024 •

edited

Loading

PommesPeter commented Jul 11, 2024

sayakpaul commented Jul 29, 2024

PommesPeter commented Jul 29, 2024

github-actions bot commented Sep 14, 2024

yiyixuxu commented Nov 16, 2024

[Alpha-VLLM Team] Feat: added fused qkv and chunked ffn #8815

[Alpha-VLLM Team] Feat: added fused qkv and chunked ffn #8815

Conversation

PommesPeter commented Jul 9, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Jul 11, 2024

yiyixuxu commented Jul 11, 2024

sayakpaul commented Jul 11, 2024 • edited Loading

PommesPeter commented Jul 11, 2024

sayakpaul commented Jul 29, 2024

PommesPeter commented Jul 29, 2024

github-actions bot commented Sep 14, 2024

yiyixuxu commented Nov 16, 2024

PommesPeter commented Jul 9, 2024 •

edited

Loading

sayakpaul commented Jul 11, 2024 •

edited

Loading