-
Notifications
You must be signed in to change notification settings - Fork 29.7k
Add packed tensor format support for flex/sdpa/eager through the mask! #39194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, just missing a test 😉
src/transformers/masking_utils.py
Outdated
# Packed format is always on batch of size 1 so we can early exit if not the case | ||
if not position_ids.shape[0] == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There really shouldn't be a restriction on this. It should work too with 2D packed tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm alright, I can lift it easily - thought it was always packed with all sequences along a batch of 1
[For maintainers] Suggested jobs to run (before merge) run-slow: arcee, aria, bitnet, cohere, cohere2, csm, deepseek_v3, dia, diffllama, dots1, emu3, gemma, gemma2, gemma3, gemma3n, glm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perfect! fixes the regression and flex + packing works again for us now
#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix
Hey @Cyrilvallez the docstring of
but it's actually a required argument, breaking existing code that calls this function. Is it intentional that it's required or should it have a default of |
We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1. Moreover, the check has been moved inside the if-branch that actually needs it instead of performing it at the start of the function. That way, no error is raised if we don't visit this branch.
We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1.
Hi, I would like to inquire whether you could implement the attention_mask with the pattern [1,1,2,2,2,3,3,3], and support packed tensors with FlashAttention for scenarios requiring a sparse mask. This approach would enable us to leverage a universal method (2D attention mask/position IDs) to handle variable-length attention via masking. Additionally, we could extend support to 4D masks for more complex cases, building upon SDPA (Scaled Dot-Product Attention) and eager attention. |
Hey @BenjaminBossan! It's optional in the sense that it can be None, but indeed I did not provide a default of None to force the models to pass the argument to always allow packed format (same as the past_key_values). |
@Snowdar for your usage of FA2, you should not pass any mask mask but forward the seqlens directly 🤗 |
Thanks for explaining. I think in this case, it would have been better to provide a default, given that the signature was changed in a backwards incompatible way and then the change was released as a patch release, where the expectation as a user is that I can always upgrade without fear of breakage. I'm not sure if this function is considered "private", but even so, I think providing a default when there is a reasonable one would have been better. Now that the patch release is out, it's too late so I don't have any strong opinion either way. |
We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1.
Yep I think we need to patch again to have a default @Cyrilvallez |
We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1.
huggingface#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix
What does this PR do?
As per the title.