Skip to content

AttentionProcessor.group_norm num_channels should be query_dim #3046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

williamberman
Copy link
Contributor

@williamberman williamberman commented Apr 10, 2023

group norm channels fix

The group_norm on the attention processor should really norm the number of channels in the query not the inner dim. This wasn't caught before because the group_norm is only used by the added kv attention processors and the added kv attention processors are only used by the karlo models which are configured such that the inner dim is the same as the query dim.

I separately ran the integration tests on the unclip/karlo models to confirm they all still pass.

See here that hidden states is normed before the projection to inner dim

hidden_states = attn.group_norm(hidden_states.transpose(1, 2)).transpose(1, 2)

add_{k,v}_proj projection to inner dim fix

Similarly, add_{k,v}_proj should be projecting to inner_dim. This is similarly not caught in karlo/unclip because the cross attention dimension is the same as the inner dim.

See here that in order to concat the projections along dimension 1, they must have the same hidden dimension

key = attn.to_k(hidden_states)
value = attn.to_v(hidden_states)
key = attn.head_to_batch_dim(key)
value = attn.head_to_batch_dim(value)
encoder_hidden_states_key_proj = attn.add_k_proj(encoder_hidden_states)
encoder_hidden_states_value_proj = attn.add_v_proj(encoder_hidden_states)
encoder_hidden_states_key_proj = attn.head_to_batch_dim(encoder_hidden_states_key_proj)
encoder_hidden_states_value_proj = attn.head_to_batch_dim(encoder_hidden_states_value_proj)
key = torch.cat([encoder_hidden_states_key_proj, key], dim=1)
value = torch.cat([encoder_hidden_states_value_proj, value], dim=1)

The group_norm on the attention processor should really norm the number
of channels in the query _not_ the inner dim. This wasn't caught before
because the group_norm is only used by the added kv attention processors
and the added kv attention processors are only used by the karlo models
which are configured such that the inner dim is the same as the query
dim.
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 10, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this!

@patrickvonplaten
Copy link
Contributor

Ah nice I see! Cool nice catch!

@williamberman williamberman merged commit 8c6b47c into huggingface:main Apr 11, 2023
@williamberman williamberman deleted the group_norm_correct_num_channels branch April 11, 2023 17:33
w4ffl35 pushed a commit to w4ffl35/diffusers that referenced this pull request Apr 14, 2023
…uggingface#3046)

* `AttentionProcessor.group_norm` num_channels should be `query_dim`

The group_norm on the attention processor should really norm the number
of channels in the query _not_ the inner dim. This wasn't caught before
because the group_norm is only used by the added kv attention processors
and the added kv attention processors are only used by the karlo models
which are configured such that the inner dim is the same as the query
dim.

* add_{k,v}_proj should be projecting to inner_dim
dg845 pushed a commit to dg845/diffusers that referenced this pull request May 6, 2023
…uggingface#3046)

* `AttentionProcessor.group_norm` num_channels should be `query_dim`

The group_norm on the attention processor should really norm the number
of channels in the query _not_ the inner dim. This wasn't caught before
because the group_norm is only used by the added kv attention processors
and the added kv attention processors are only used by the karlo models
which are configured such that the inner dim is the same as the query
dim.

* add_{k,v}_proj should be projecting to inner_dim
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
…uggingface#3046)

* `AttentionProcessor.group_norm` num_channels should be `query_dim`

The group_norm on the attention processor should really norm the number
of channels in the query _not_ the inner dim. This wasn't caught before
because the group_norm is only used by the added kv attention processors
and the added kv attention processors are only used by the karlo models
which are configured such that the inner dim is the same as the query
dim.

* add_{k,v}_proj should be projecting to inner_dim
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…uggingface#3046)

* `AttentionProcessor.group_norm` num_channels should be `query_dim`

The group_norm on the attention processor should really norm the number
of channels in the query _not_ the inner dim. This wasn't caught before
because the group_norm is only used by the added kv attention processors
and the added kv attention processors are only used by the karlo models
which are configured such that the inner dim is the same as the query
dim.

* add_{k,v}_proj should be projecting to inner_dim
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants