Add packed tensor format support for flex/sdpa/eager through the mask! #39194

Cyrilvallez · 2025-07-03T09:52:35Z

What does this PR do?

As per the title.

import torch
from transformers import AutoModelForCausalLM
from transformers.masking_utils import create_causal_mask

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B", torch_dtype=torch.float16)


batch_size = 1
sequence_length = 10
cache_position = torch.arange(sequence_length)
position_ids = torch.tensor([[0,1,2,3,0,1,0,1,2,3]])  # This corresponds to 3 packed sequences

attention_mask = create_causal_mask(
    config=model.config,
    # we only need batch size, seq_length and dtype here - we don't care about the values of the embeddings
    input_embeds=torch.empty((batch_size, sequence_length), dtype=model.dtype),
    attention_mask=None,
    cache_position=cache_position,
    past_key_values=None,
    position_ids=position_ids,
)
attention_mask

>>> tensor([[[[ True, False, False, False, False, False, False, False, False, False],
          [ True,  True, False, False, False, False, False, False, False, False],
          [ True,  True,  True, False, False, False, False, False, False, False],
          [ True,  True,  True,  True, False, False, False, False, False, False],
          [False, False, False, False,  True, False, False, False, False, False],
          [False, False, False, False,  True,  True, False, False, False, False],
          [False, False, False, False, False, False,  True, False, False, False],
          [False, False, False, False, False, False,  True,  True, False, False],
          [False, False, False, False, False, False,  True,  True,  True, False],
          [False, False, False, False, False, False,  True,  True,  True,  True]]]])

HuggingFaceDocBuilderDev · 2025-07-03T10:06:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Very nice, just missing a test 😉

winglian · 2025-07-03T11:56:55Z

src/transformers/masking_utils.py

+    # Packed format is always on batch of size 1 so we can early exit if not the case
+    if not position_ids.shape[0] == 1:


There really shouldn't be a restriction on this. It should work too with 2D packed tensors.

Humm alright, I can lift it easily - thought it was always packed with all sequences along a batch of 1

github-actions · 2025-07-03T15:41:32Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: arcee, aria, bitnet, cohere, cohere2, csm, deepseek_v3, dia, diffllama, dots1, emu3, gemma, gemma2, gemma3, gemma3n, glm

winglian

perfect! fixes the regression and flex + packing works again for us now

#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix

BenjaminBossan · 2025-07-04T10:56:34Z

Hey @Cyrilvallez the docstring of create_masks_for_generate says the argument is optional:

    position_ids (`torch.Tensor`, optional)
       A 2D tensor of shape (batch_size, query_length) indicating the positions of each token in the sequences.

but it's actually a required argument, breaking existing code that calls this function. Is it intentional that it's required or should it have a default of None?

We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1. Moreover, the check has been moved inside the if-branch that actually needs it instead of performing it at the start of the function. That way, no error is raised if we don't visit this branch.

We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1.

Snowdar · 2025-07-07T08:02:36Z

Hi, I would like to inquire whether you could implement the attention_mask with the pattern [1,1,2,2,2,3,3,3], and support packed tensors with FlashAttention for scenarios requiring a sparse mask.

This approach would enable us to leverage a universal method (2D attention mask/position IDs) to handle variable-length attention via masking. Additionally, we could extend support to 4D masks for more complex cases, building upon SDPA (Scaled Dot-Product Attention) and eager attention.

Cyrilvallez · 2025-07-07T08:33:48Z

Hey @BenjaminBossan! It's optional in the sense that it can be None, but indeed I did not provide a default of None to force the models to pass the argument to always allow packed format (same as the past_key_values).
We could however rethink the default values maybe (e.g. it could make sense to allow cache_position to be None as well when the kv are as well, and construct them on the fly for external usage of the mask functions). Let me know your thoughts!

Cyrilvallez · 2025-07-07T08:34:16Z

@Snowdar for your usage of FA2, you should not pass any mask mask but forward the seqlens directly 🤗

BenjaminBossan · 2025-07-07T08:48:14Z

It's optional in the sense that it can be None, but indeed I did not provide a default of None to force the models to pass the argument to always allow packed format (same as the past_key_values).
We could however rethink the default values maybe (e.g. it could make sense to allow cache_position to be None as well when the kv are as well, and construct them on the fly for external usage of the mask functions). Let me know your thoughts!

Thanks for explaining. I think in this case, it would have been better to provide a default, given that the signature was changed in a backwards incompatible way and then the change was released as a patch release, where the expectation as a user is that I can always upgrade without fear of breakage. I'm not sure if this function is considered "private", but even so, I think providing a default when there is a reasonable one would have been better. Now that the patch release is out, it's too late so I don't have any strong opinion either way.

We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1.

ArthurZucker · 2025-07-08T10:03:29Z

Yep I think we need to patch again to have a default @Cyrilvallez

We use create_mask_for_generate from transformers. It was introduced in v4.53.0 but in v4.53.1, the function signature was changed to include position_ids as mandatory argument: huggingface/transformers#39194 This breaks our function call in PEFT. This PR fixes the function call by passing position_ids. This in turn would break the function call with transformers v4.53.0, thus a strict version check is being used for >= v4.53.1.

huggingface#39194) * Add the necesary logic to mask_utils * add it everywhere * Update masking_utils.py * style * Update masking_utils.py * Update modeling_mimi.py * Update masking_utils.py * add support for more than batch size 1 * Update masking_utils.py * add test * style * Update test_masking_utils.py * Update masking_utils.py * add require_token * fix tests * fix

Cyrilvallez added 4 commits July 3, 2025 11:28

Add the necesary logic to mask_utils

8eefa56

add it everywhere

bce4ea7

Update masking_utils.py

0364355

style

f96014b

Cyrilvallez added 3 commits July 3, 2025 12:07

Update masking_utils.py

ecef50e

Update modeling_mimi.py

d7fd421

Update masking_utils.py

431fe62

ArthurZucker reviewed Jul 3, 2025

View reviewed changes

winglian reviewed Jul 3, 2025

View reviewed changes

Cyrilvallez added 9 commits July 3, 2025 15:11

add support for more than batch size 1

111a3ea

Update masking_utils.py

4fed806

add test

810d33f

style

554be7d

Update test_masking_utils.py

33dcfb0

Update masking_utils.py

7ee171f

add require_token

45e885b

fix tests

f0a3d28

fix

d62f88a

Cyrilvallez added the for patch Tag issues / labels that should be included in the next patch label Jul 3, 2025

winglian approved these changes Jul 3, 2025

View reviewed changes

Cyrilvallez merged commit 0cf2791 into main Jul 4, 2025
27 checks passed

Cyrilvallez deleted the packing-mask branch July 4, 2025 07:01

BenjaminBossan mentioned this pull request Jul 4, 2025

FIX: Create mask function signature change in transformers 4.53.1 huggingface/peft#2633

Merged

Cyrilvallez mentioned this pull request Jul 7, 2025

[masking] fix Aggressive boolean conversion breaking packing implementations #39148

Closed

5 tasks

giulio98 mentioned this pull request Jul 7, 2025

Refactor evaluation script NVIDIA/kvpress#89

Closed

Cyrilvallez mentioned this pull request Jul 9, 2025

Add a default value for position_ids in masking_utils #39310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add packed tensor format support for flex/sdpa/eager through the mask! #39194

Add packed tensor format support for flex/sdpa/eager through the mask! #39194

Uh oh!

Cyrilvallez commented Jul 3, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jul 3, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

winglian Jul 3, 2025

Uh oh!

Cyrilvallez Jul 3, 2025

Uh oh!

github-actions bot commented Jul 3, 2025

Uh oh!

winglian left a comment

Uh oh!

Uh oh!

BenjaminBossan commented Jul 4, 2025

Uh oh!

Snowdar commented Jul 7, 2025

Uh oh!

Cyrilvallez commented Jul 7, 2025

Uh oh!

Cyrilvallez commented Jul 7, 2025

Uh oh!

BenjaminBossan commented Jul 7, 2025

Uh oh!

ArthurZucker commented Jul 8, 2025

Uh oh!

Uh oh!

		# Packed format is always on batch of size 1 so we can early exit if not the case
		if not position_ids.shape[0] == 1:

Add packed tensor format support for flex/sdpa/eager through the mask! #39194

Add packed tensor format support for flex/sdpa/eager through the mask! #39194

Uh oh!

Conversation

Cyrilvallez commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Jul 3, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

winglian Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 3, 2025

Uh oh!

winglian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BenjaminBossan commented Jul 4, 2025

Uh oh!

Snowdar commented Jul 7, 2025

Uh oh!

Cyrilvallez commented Jul 7, 2025

Uh oh!

Cyrilvallez commented Jul 7, 2025

Uh oh!

BenjaminBossan commented Jul 7, 2025

Uh oh!

ArthurZucker commented Jul 8, 2025

Uh oh!

Uh oh!

Cyrilvallez commented Jul 3, 2025 •

edited

Loading