-
Notifications
You must be signed in to change notification settings - Fork 6k
AttentionProcessor.group_norm
num_channels should be query_dim
#3046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
williamberman
merged 2 commits into
huggingface:main
from
williamberman:group_norm_correct_num_channels
Apr 11, 2023
Merged
AttentionProcessor.group_norm
num_channels should be query_dim
#3046
williamberman
merged 2 commits into
huggingface:main
from
williamberman:group_norm_correct_num_channels
Apr 11, 2023
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The group_norm on the attention processor should really norm the number of channels in the query _not_ the inner dim. This wasn't caught before because the group_norm is only used by the added kv attention processors and the added kv attention processors are only used by the karlo models which are configured such that the inner dim is the same as the query dim.
The documentation is not available anymore as the PR was closed or merged. |
sayakpaul
approved these changes
Apr 11, 2023
yiyixuxu
approved these changes
Apr 11, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this!
Ah nice I see! Cool nice catch! |
patrickvonplaten
approved these changes
Apr 11, 2023
w4ffl35
pushed a commit
to w4ffl35/diffusers
that referenced
this pull request
Apr 14, 2023
…uggingface#3046) * `AttentionProcessor.group_norm` num_channels should be `query_dim` The group_norm on the attention processor should really norm the number of channels in the query _not_ the inner dim. This wasn't caught before because the group_norm is only used by the added kv attention processors and the added kv attention processors are only used by the karlo models which are configured such that the inner dim is the same as the query dim. * add_{k,v}_proj should be projecting to inner_dim
dg845
pushed a commit
to dg845/diffusers
that referenced
this pull request
May 6, 2023
…uggingface#3046) * `AttentionProcessor.group_norm` num_channels should be `query_dim` The group_norm on the attention processor should really norm the number of channels in the query _not_ the inner dim. This wasn't caught before because the group_norm is only used by the added kv attention processors and the added kv attention processors are only used by the karlo models which are configured such that the inner dim is the same as the query dim. * add_{k,v}_proj should be projecting to inner_dim
yoonseokjin
pushed a commit
to yoonseokjin/diffusers
that referenced
this pull request
Dec 25, 2023
…uggingface#3046) * `AttentionProcessor.group_norm` num_channels should be `query_dim` The group_norm on the attention processor should really norm the number of channels in the query _not_ the inner dim. This wasn't caught before because the group_norm is only used by the added kv attention processors and the added kv attention processors are only used by the karlo models which are configured such that the inner dim is the same as the query dim. * add_{k,v}_proj should be projecting to inner_dim
AmericanPresidentJimmyCarter
pushed a commit
to AmericanPresidentJimmyCarter/diffusers
that referenced
this pull request
Apr 26, 2024
…uggingface#3046) * `AttentionProcessor.group_norm` num_channels should be `query_dim` The group_norm on the attention processor should really norm the number of channels in the query _not_ the inner dim. This wasn't caught before because the group_norm is only used by the added kv attention processors and the added kv attention processors are only used by the karlo models which are configured such that the inner dim is the same as the query dim. * add_{k,v}_proj should be projecting to inner_dim
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
group norm channels fix
The group_norm on the attention processor should really norm the number of channels in the query not the inner dim. This wasn't caught before because the group_norm is only used by the added kv attention processors and the added kv attention processors are only used by the karlo models which are configured such that the inner dim is the same as the query dim.
I separately ran the integration tests on the unclip/karlo models to confirm they all still pass.
See here that hidden states is normed before the projection to inner dim
diffusers/src/diffusers/models/attention_processor.py
Line 406 in 67c3518
add_{k,v}_proj projection to inner dim fix
Similarly,
add_{k,v}_proj
should be projecting toinner_dim
. This is similarly not caught in karlo/unclip because the cross attention dimension is the same as the inner dim.See here that in order to concat the projections along dimension 1, they must have the same hidden dimension
diffusers/src/diffusers/models/attention_processor.py
Lines 411 to 422 in 67c3518