Skip to content

Conversation

@OliBomby
Copy link
Contributor

@OliBomby OliBomby commented Feb 28, 2025

What does this PR do?

Fixes incorrect attention calculation when training Whisper with Flash Attention 2 and passing decoder_attention_mask with some values set to False.

This error might've been made when copying the same truncating code from the other attention implementations. The problem is that in WhisperFlashAttention2 the dimensions have been transposed.

cc @sanchit-gandhi @ylacombe

@github-actions github-actions bot marked this pull request as draft February 28, 2025 14:24
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@OliBomby OliBomby marked this pull request as ready for review February 28, 2025 14:25
@Rocketknight1
Copy link
Member

cc @eustlb

@vasqu
Copy link
Contributor

vasqu commented Apr 4, 2025

I think this is still a pretty big bug since it highly affects the attention computations in fa2 and addresses #36585 (which has been closed under the assumption of this fix)

cc @eustlb

@ronansgd
Copy link

ronansgd commented May 9, 2025

It would be great to merge this fix! cc @eustlb

@greg2451
Copy link
Contributor

Also affected by the issue, thanks for the fix @OliBomby.

@eustlb would be really great to get this merged 🙏🏼

Copy link
Contributor

@eustlb eustlb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great catch, thanks a lot 🤗

@eustlb eustlb enabled auto-merge (squash) May 14, 2025 20:00
@eustlb eustlb merged commit 4005e30 into huggingface:main May 14, 2025
20 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants