`AttentionProcessor.group_norm` num_channels should be `query_dim` #3046

williamberman · 2023-04-10T21:40:48Z

group norm channels fix

The group_norm on the attention processor should really norm the number of channels in the query not the inner dim. This wasn't caught before because the group_norm is only used by the added kv attention processors and the added kv attention processors are only used by the karlo models which are configured such that the inner dim is the same as the query dim.

I separately ran the integration tests on the unclip/karlo models to confirm they all still pass.

See here that hidden states is normed before the projection to inner dim

diffusers/src/diffusers/models/attention_processor.py

Line 406 in 67c3518

hidden_states = attn.group_norm(hidden_states.transpose(1, 2)).transpose(1, 2)

add_{k,v}_proj projection to inner dim fix

Similarly, add_{k,v}_proj should be projecting to inner_dim. This is similarly not caught in karlo/unclip because the cross attention dimension is the same as the inner dim.

See here that in order to concat the projections along dimension 1, they must have the same hidden dimension

diffusers/src/diffusers/models/attention_processor.py

Lines 411 to 422 in 67c3518

    
           key = attn.to_k(hidden_states) 
        
           value = attn.to_v(hidden_states) 
        
           key = attn.head_to_batch_dim(key) 
        
           value = attn.head_to_batch_dim(value) 
        
           encoder_hidden_states_key_proj = attn.add_k_proj(encoder_hidden_states) 
        
           encoder_hidden_states_value_proj = attn.add_v_proj(encoder_hidden_states) 
        
           encoder_hidden_states_key_proj = attn.head_to_batch_dim(encoder_hidden_states_key_proj) 
        
           encoder_hidden_states_value_proj = attn.head_to_batch_dim(encoder_hidden_states_value_proj) 
        
           key = torch.cat([encoder_hidden_states_key_proj, key], dim=1) 
        
           value = torch.cat([encoder_hidden_states_value_proj, value], dim=1)

The group_norm on the attention processor should really norm the number of channels in the query _not_ the inner dim. This wasn't caught before because the group_norm is only used by the added kv attention processors and the added kv attention processors are only used by the karlo models which are configured such that the inner dim is the same as the query dim.

HuggingFaceDocBuilderDev · 2023-04-10T21:45:44Z

The documentation is not available anymore as the PR was closed or merged.

yiyixuxu

Thanks for catching this!

patrickvonplaten · 2023-04-11T14:42:23Z

Ah nice I see! Cool nice catch!

…uggingface#3046) * `AttentionProcessor.group_norm` num_channels should be `query_dim` The group_norm on the attention processor should really norm the number of channels in the query _not_ the inner dim. This wasn't caught before because the group_norm is only used by the added kv attention processors and the added kv attention processors are only used by the karlo models which are configured such that the inner dim is the same as the query dim. * add_{k,v}_proj should be projecting to inner_dim

williamberman requested review from patrickvonplaten, pcuenca, yiyixuxu and sayakpaul April 10, 2023 21:41

williamberman mentioned this pull request Apr 10, 2023

add only cross attention to simple attention blocks #3011

Merged

add_{k,v}_proj should be projecting to inner_dim

33b4c02

sayakpaul approved these changes Apr 11, 2023

View reviewed changes

yiyixuxu approved these changes Apr 11, 2023

View reviewed changes

patrickvonplaten approved these changes Apr 11, 2023

View reviewed changes

williamberman merged commit 8c6b47c into huggingface:main Apr 11, 2023

williamberman deleted the group_norm_correct_num_channels branch April 11, 2023 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`AttentionProcessor.group_norm` num_channels should be `query_dim` #3046

`AttentionProcessor.group_norm` num_channels should be `query_dim` #3046

williamberman commented Apr 10, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 10, 2023 •

edited

Loading

yiyixuxu left a comment

patrickvonplaten commented Apr 11, 2023

	key = attn.to_k(hidden_states)
	value = attn.to_v(hidden_states)
	key = attn.head_to_batch_dim(key)
	value = attn.head_to_batch_dim(value)

	encoder_hidden_states_key_proj = attn.add_k_proj(encoder_hidden_states)
	encoder_hidden_states_value_proj = attn.add_v_proj(encoder_hidden_states)
	encoder_hidden_states_key_proj = attn.head_to_batch_dim(encoder_hidden_states_key_proj)
	encoder_hidden_states_value_proj = attn.head_to_batch_dim(encoder_hidden_states_value_proj)

	key = torch.cat([encoder_hidden_states_key_proj, key], dim=1)
	value = torch.cat([encoder_hidden_states_value_proj, value], dim=1)

AttentionProcessor.group_norm num_channels should be query_dim #3046

AttentionProcessor.group_norm num_channels should be query_dim #3046

Conversation

williamberman commented Apr 10, 2023 • edited Loading

group norm channels fix

add_{k,v}_proj projection to inner dim fix

HuggingFaceDocBuilderDev commented Apr 10, 2023 • edited Loading

yiyixuxu left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Apr 11, 2023

`AttentionProcessor.group_norm` num_channels should be `query_dim` #3046

`AttentionProcessor.group_norm` num_channels should be `query_dim` #3046

williamberman commented Apr 10, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 10, 2023 •

edited

Loading