Skip to content

[Alpha-VLLM Team] Feat: added fused qkv and chunked ffn #8815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

PommesPeter
Copy link
Contributor

@PommesPeter PommesPeter commented Jul 9, 2024

What does this PR do?

Fixes #8652

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@a-r-r-o-w a-r-r-o-w requested review from DN6 and yiyixuxu July 9, 2024 10:13
@yiyixuxu yiyixuxu requested a review from sayakpaul July 10, 2024 23:57
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yiyixuxu
Copy link
Collaborator

so I think you will actually need to wrap your feedforward layers inside _chunked_feed_forward in order for it to work

if self._chunk_size is not None:

(sorry I think it is our fault, I think the example PR we gave you did not do this either)

i see very little memory saving though with this method cc @sayakpaul

@sayakpaul
Copy link
Member

sayakpaul commented Jul 11, 2024

i see very little memory saving though with this method

Yeah, it was very expected. It was probably because of parameter-flop patterns. For other models like SDXL, Stable Video, etc., where transformer blocks are used alongside conv blocks, we see quite nice memory savings with chunked feedforward. What is even more interesting is that during my initial studies with the 800M SD3 variant, the savings were evident but they diminished with the 2B variant.

@PommesPeter have you played around with the dim and chunk_size parameters of enable_forward_chunking()? Usually, setting dim to 1 is necessary to achieve more memory savings.

For the QKV fusion, let's maybe wait for #8829 to get merged?

Apologies for the fault on our part.

@PommesPeter
Copy link
Contributor Author

i see very little memory saving though with this method

Yeah, it was very expected. It was probably because of parameter-flop patterns. For other models like SDXL, Stable Video, etc., where transformer blocks are used alongside conv blocks, we see quite nice memory savings with chunked feedforward. What is even more interesting is that during my initial studies with the 800M SD3 variant, the savings were evident but they diminished with the 2B variant.

@PommesPeter have you played around with the dim and chunk_size parameters of enable_forward_chunking()? Usually, setting dim to 1 is necessary to achieve more memory savings.

For the QKV fusion, let's maybe wait for #8829 to get merged?

Apologies for the fault on our part.

I haven't tried chunked ffn before. I should play it with different parameters.

For qkv fused:
I would like to wait the #8829 when it has been merged.

@sayakpaul
Copy link
Member

@PommesPeter hey there :)

#8829 is now merged. Would you like to revive this PR again?

@PommesPeter
Copy link
Contributor Author

@PommesPeter hey there :)

#8829 is now merged. Would you like to revive this PR again?

Okay, I will revive this PR.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Sep 14, 2024
@yiyixuxu
Copy link
Collaborator

closing this since these techniques are not particularly effective here

@yiyixuxu yiyixuxu closed this Nov 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Issues that haven't received updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants