Fix bugs in mllama image processing #36156

tjohnson31415 · 2025-02-13T07:26:39Z

What does this PR do?

I found that using a 1x1 PIL image with the MllamaImageProcessor surfaced a couple of bugs:
A 1x1 PIL image results in ambgiuous channel dimension in infer_channel_dimension_format. The default in the ambiguous case is to use the first dimension, which is incorrect and results in a ValueError: mean must have 1 elements if it is an iterable, got 3. Another bug was that explicility setting the input channel format results in a similar error: ValueError: mean must have 224 elements if it is an iterable, got 3.

This PR resolves the bugs by:

fixing support calling the processor with input_data_format="channels_last" by using data_format instead of input_data_format for rescaling and normalizing after the call to to_channel_dimension_format
checking if the image is a PIL image and inferring that the channel dim is last if not expiclitly set

There is some discussion in an issue reporting the same bug that was closed without a fix: #34029

Another edge case bug this PR fixes is handling an image that has an impractical aspect ratio instead of having PIL raise a ValueError: height and width must be > 0 error. The fix here is to force the minimum resized image dimensions to be 1 or greater.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker
@qubvel

Signed-off-by: Travis Johnson <[email protected]>

Rocketknight1 · 2025-03-06T09:40:39Z

gentle ping @qubvel

qubvel

Hi @tjohnson31415, thanks for the PR, I left a few comments

tests/models/mllama/test_image_processing_mllama.py

src/transformers/models/mllama/image_processing_mllama.py

qubvel · 2025-03-06T09:51:54Z

tests/models/mllama/test_image_processing_mllama.py

+        image_inputs = [[Image.new("RGB", (100, 1))]]
+        encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values
+        expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs)
+        self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape))
+


can you also add a test with 1x1 image for PIL?

Co-authored-by: Pavel Iakubovskii <[email protected]>

Signed-off-by: Travis Johnson <[email protected]>

tjohnson31415 · 2025-03-06T22:12:31Z

Thanks @Rocketknight1 for the ping and @qubvel for the review!

I have made the suggested changes.

I also added a fix for another edge-case bug with an image with an impractical aspect ratio that would result in the resized image having an invalid dimension of 0. Please review this additional fix and test as well 🙏

ArthurZucker

LGTM 🤗

tjohnson31415 and others added 6 commits February 12, 2025 23:30

fix: handle input_channel_dim == channels_last

4da92c5

Signed-off-by: Travis Johnson <[email protected]>

fix: default PIL images to channels_last

68189b4

Signed-off-by: Travis Johnson <[email protected]>

Merge branch 'main' into fix-mllama-image-processing

c02dfc5

Merge branch 'main' into fix-mllama-image-processing

aa805b3

Merge branch 'main' into fix-mllama-image-processing

51abd16

Merge branch 'main' into fix-mllama-image-processing

bc15671

qubvel reviewed Mar 6, 2025

View reviewed changes

tjohnson31415 and others added 4 commits March 6, 2025 13:16

Apply suggestions from code review

bdaa296

Co-authored-by: Pavel Iakubovskii <[email protected]>

fixup from review batch

1194361

Signed-off-by: Travis Johnson <[email protected]>

test: add 1x1 PIL image to ambiguous channel test

859aaab

Signed-off-by: Travis Johnson <[email protected]>

fix(mllama): avoid 0 dimension for image with impractical aspect ratio

d22cd02

Signed-off-by: Travis Johnson <[email protected]>

tjohnson31415 force-pushed the fix-mllama-image-processing branch from c162ced to d22cd02 Compare March 6, 2025 22:11

Merge branch 'main' into fix-mllama-image-processing

4299fcb

ArthurZucker approved these changes Mar 11, 2025

View reviewed changes

ArthurZucker merged commit d8663cb into huggingface:main Mar 11, 2025
9 checks passed

tjohnson31415 deleted the fix-mllama-image-processing branch March 11, 2025 16:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix bugs in mllama image processing #36156

Fix bugs in mllama image processing #36156

Uh oh!

tjohnson31415 commented Feb 13, 2025 •

edited

Loading

Uh oh!

Rocketknight1 commented Mar 6, 2025

Uh oh!

qubvel left a comment

Uh oh!

Uh oh!

Uh oh!

qubvel Mar 6, 2025

Uh oh!

tjohnson31415 commented Mar 6, 2025 •

edited

Loading

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix bugs in mllama image processing #36156

Fix bugs in mllama image processing #36156

Uh oh!

Conversation

tjohnson31415 commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Mar 6, 2025

Uh oh!

qubvel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

qubvel Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

tjohnson31415 commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tjohnson31415 commented Feb 13, 2025 •

edited

Loading

tjohnson31415 commented Mar 6, 2025 •

edited

Loading