Skip to content

Conversation

@tjohnson31415
Copy link
Contributor

@tjohnson31415 tjohnson31415 commented Feb 13, 2025

What does this PR do?

I found that using a 1x1 PIL image with the MllamaImageProcessor surfaced a couple of bugs:
A 1x1 PIL image results in ambgiuous channel dimension in infer_channel_dimension_format. The default in the ambiguous case is to use the first dimension, which is incorrect and results in a ValueError: mean must have 1 elements if it is an iterable, got 3. Another bug was that explicility setting the input channel format results in a similar error: ValueError: mean must have 224 elements if it is an iterable, got 3.

This PR resolves the bugs by:

  1. fixing support calling the processor with input_data_format="channels_last" by using data_format instead of input_data_format for rescaling and normalizing after the call to to_channel_dimension_format
  2. checking if the image is a PIL image and inferring that the channel dim is last if not expiclitly set

There is some discussion in an issue reporting the same bug that was closed without a fix: #34029

Another edge case bug this PR fixes is handling an image that has an impractical aspect ratio instead of having PIL raise a ValueError: height and width must be > 0 error. The fix here is to force the minimum resized image dimensions to be 1 or greater.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker
@qubvel

@Rocketknight1
Copy link
Member

gentle ping @qubvel

Copy link
Contributor

@qubvel qubvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tjohnson31415, thanks for the PR, I left a few comments

Comment on lines 243 to 247
image_inputs = [[Image.new("RGB", (100, 1))]]
encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values
expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add a test with 1x1 image for PIL?

@tjohnson31415 tjohnson31415 force-pushed the fix-mllama-image-processing branch from c162ced to d22cd02 Compare March 6, 2025 22:11
@tjohnson31415
Copy link
Contributor Author

tjohnson31415 commented Mar 6, 2025

Thanks @Rocketknight1 for the ping and @qubvel for the review!

I have made the suggested changes.

I also added a fix for another edge-case bug with an image with an impractical aspect ratio that would result in the resized image having an invalid dimension of 0. Please review this additional fix and test as well 🙏

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤗

@ArthurZucker ArthurZucker merged commit d8663cb into huggingface:main Mar 11, 2025
9 checks passed
@tjohnson31415 tjohnson31415 deleted the fix-mllama-image-processing branch March 11, 2025 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Image processing for mllama is broken for Wx1 (i.e. height == 1) image sizes

4 participants