Fix pixel attention mask padding in smolvlm #37497

ManuelFay · 2025-04-14T15:38:22Z

What does this PR do?

The attention mask (binary) is initialized with zeros in float64, while in make_pixel_mask, the ones and zeros are in int64.

def make_pixel_mask(
    image: np.ndarray, output_size: Tuple[int, int], input_data_format: Optional[Union[str, ChannelDimension]] = None
) -> np.ndarray:
    """
    Make a pixel mask for the image, where 1 indicates a valid pixel and 0 indicates padding.
    Args:
        image (`np.ndarray`):
            Image to make the pixel mask for.
        output_size (`Tuple[int, int]`):
            Output size of the mask.
    """
    input_height, input_width = get_image_size(image, channel_dim=input_data_format)
    mask = np.zeros(output_size, dtype=np.int64)
    mask[:input_height, :input_width] = 1
    return mask

Fixes # (issue)

This causes either int64, or float64 mask values depending on if inputs are padded.
This makes MPS devices bug since they don't support float64.

By the way @andimarafioti , we could just cast as np.bool if masks are binary but I guess there is a reason it was done in np.int64 ?

Who can review?

@orrzohar @andimarafioti @zucchini-nlp

github-actions · 2025-04-14T15:38:34Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

qubvel

LGTM, thanks!

zucchini-nlp

Great, let's merge!

HuggingFaceDocBuilderDev · 2025-04-15T08:21:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* fix bad init * also modif smolvlm --------- Co-authored-by: Raushan Turganbay <[email protected]>

fix bad init

7eb75bf

github-actions bot marked this pull request as draft April 14, 2025 15:38

ManuelFay mentioned this pull request Apr 14, 2025

fix: downcast float64 and int64 to float32 and int32 on MPS devices to ensure compatibility when using Idefics3 illuin-tech/colpali#231

Closed

also modif smolvlm

715716d

ManuelFay marked this pull request as ready for review April 14, 2025 15:55

github-actions bot requested a review from qubvel April 14, 2025 15:55

qubvel approved these changes Apr 14, 2025

View reviewed changes

zucchini-nlp approved these changes Apr 15, 2025

View reviewed changes

Merge branch 'main' into fix-smolvlm-pixel-attention-mask-padding

82e1dbe

zucchini-nlp merged commit c94c59f into huggingface:main Apr 16, 2025
9 checks passed

cyr0930 pushed a commit to cyr0930/transformers that referenced this pull request Apr 18, 2025

Fix pixel attention mask padding in smolvlm (huggingface#37497)

2b44c77

* fix bad init * also modif smolvlm --------- Co-authored-by: Raushan Turganbay <[email protected]>

zucchini-nlp added a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025

Fix pixel attention mask padding in smolvlm (huggingface#37497)

f3aaabc

* fix bad init * also modif smolvlm --------- Co-authored-by: Raushan Turganbay <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix pixel attention mask padding in smolvlm #37497

Fix pixel attention mask padding in smolvlm #37497

Uh oh!

ManuelFay commented Apr 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 14, 2025

Uh oh!

qubvel left a comment

Uh oh!

zucchini-nlp left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix pixel attention mask padding in smolvlm #37497

Fix pixel attention mask padding in smolvlm #37497

Uh oh!

Conversation

ManuelFay commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

github-actions bot commented Apr 14, 2025

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ManuelFay commented Apr 14, 2025 •

edited

Loading