[bugfix] fix flash attention 2 unavailable error on Ascend NPU #39166
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
#38972 introduce flash attention 3 into
transformers
. However, the modification introduce a bug when using flash attention 2 on Ascend NPU.The core reason is due to function names mismatch:
Functions defined from
transformers.integrations.npu_flash_attention
:transformers/src/transformers/modeling_flash_attention_utils.py
Lines 140 to 153 in e8e0c76
Functions actually used:
transformers/src/transformers/modeling_flash_attention_utils.py
Lines 470 to 475 in e8e0c76
This PR is committed for solving this problem, by renaming flash attention 2 related functions (e.g.
npu_flash_attn_func
) fromtransformers.integrations.npu_flash_attention
to correct names, which should contain_2_
symbol (e.g.flash_attn_2_func
)Fixes # (issue)
Not related.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.