Skip to content

Conversation

@gante
Copy link
Contributor

@gante gante commented Mar 15, 2025

What does this PR do?

Follow-up to #36650

Removes redundant test cases in test_eager_matches_sdpa_inference, related to output_attentions.

SDPA has special attention mask preparation (e.g. see here) but, when we pass output_attentions, SDPA reverts to eager attention. That means that mask preparation for SDPA + output_attentions must be the same as eager. In #30652 we added a for loop to test the cases with output_attentions in (True, False), resulting in 24 new subtests.

That's testing overkill, all we need to check is to confirm that we're preparing the attention mask correctly with output_attentions=True. This PR removes all but 1 of the redundant tests.

@ydshieh with this PR, both CPU and GPU CI is faster than in v4.49.0 🤗

@gante gante requested a review from ydshieh March 15, 2025 11:17
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@github-actions github-actions bot marked this pull request as draft March 15, 2025 11:17
for enable_kernels in (True, False)
]
# Extra test case: `output_attentions=True` has special attention mask handling and sdpa reverts to eager
] + [("fp32_pad_left_output_attentions", "fp32", "left", True, True, False)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

padding-side left is the hardest test case: the causal mask will have 0 on the left (padding) and on the right (causal mask)

@gante gante marked this pull request as ready for review March 15, 2025 11:18
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice finding.

(BTW, would you like to add a reference to where shows this fact

output_attentions=True has special attention mask

Might be helpful for further reader. (I have no idea, but you got my full trust. I am thinking for other people)

@gante
Copy link
Contributor Author

gante commented Mar 17, 2025

Added a code link in the PR header for the special attention mask manipulation with SDPA + output_attentions=True, for future reference 👍

@gante gante merged commit cff4caa into huggingface:main Mar 17, 2025
23 checks passed
@gante gante deleted the test_eager_matches_sdpa_redundant_checks branch March 17, 2025 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants