Skip to content

Conversation

@ManuelFay
Copy link
Contributor

@ManuelFay ManuelFay commented Apr 14, 2025

What does this PR do?

The attention mask (binary) is initialized with zeros in float64, while in make_pixel_mask, the ones and zeros are in int64.

def make_pixel_mask(
    image: np.ndarray, output_size: Tuple[int, int], input_data_format: Optional[Union[str, ChannelDimension]] = None
) -> np.ndarray:
    """
    Make a pixel mask for the image, where 1 indicates a valid pixel and 0 indicates padding.
    Args:
        image (`np.ndarray`):
            Image to make the pixel mask for.
        output_size (`Tuple[int, int]`):
            Output size of the mask.
    """
    input_height, input_width = get_image_size(image, channel_dim=input_data_format)
    mask = np.zeros(output_size, dtype=np.int64)
    mask[:input_height, :input_width] = 1
    return mask

Fixes # (issue)

This causes either int64, or float64 mask values depending on if inputs are padded.
This makes MPS devices bug since they don't support float64.

By the way @andimarafioti , we could just cast as np.bool if masks are binary but I guess there is a reason it was done in np.int64 ?

Who can review?

@orrzohar @andimarafioti @zucchini-nlp

@github-actions github-actions bot marked this pull request as draft April 14, 2025 15:38
@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@ManuelFay ManuelFay marked this pull request as ready for review April 14, 2025 15:55
@github-actions github-actions bot requested a review from qubvel April 14, 2025 15:55
Copy link
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, let's merge!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp zucchini-nlp merged commit c94c59f into huggingface:main Apr 16, 2025
9 checks passed
cyr0930 pushed a commit to cyr0930/transformers that referenced this pull request Apr 18, 2025
* fix bad init

* also modif smolvlm

---------

Co-authored-by: Raushan Turganbay <[email protected]>
zucchini-nlp added a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* fix bad init

* also modif smolvlm

---------

Co-authored-by: Raushan Turganbay <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants