-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Fix bugs in mllama image processing #36156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bugs in mllama image processing #36156
Conversation
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
|
gentle ping @qubvel |
qubvel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tjohnson31415, thanks for the PR, I left a few comments
| image_inputs = [[Image.new("RGB", (100, 1))]] | ||
| encoded_images = image_processing(image_inputs, return_tensors="pt").pixel_values | ||
| expected_output_image_shape = self.image_processor_tester.expected_output_image_shape(image_inputs) | ||
| self.assertEqual(tuple(encoded_images.shape), (1, *expected_output_image_shape)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add a test with 1x1 image for PIL?
Co-authored-by: Pavel Iakubovskii <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
c162ced to
d22cd02
Compare
|
Thanks @Rocketknight1 for the ping and @qubvel for the review! I have made the suggested changes. I also added a fix for another edge-case bug with an image with an impractical aspect ratio that would result in the resized image having an invalid dimension of 0. Please review this additional fix and test as well 🙏 |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🤗
What does this PR do?
I found that using a 1x1 PIL image with the
MllamaImageProcessorsurfaced a couple of bugs:A 1x1 PIL image results in ambgiuous channel dimension in
infer_channel_dimension_format. The default in the ambiguous case is to use the first dimension, which is incorrect and results in aValueError: mean must have 1 elements if it is an iterable, got 3. Another bug was that explicility setting the input channel format results in a similar error:ValueError: mean must have 224 elements if it is an iterable, got 3.This PR resolves the bugs by:
input_data_format="channels_last"by usingdata_formatinstead ofinput_data_formatfor rescaling and normalizing after the call toto_channel_dimension_formatThere is some discussion in an issue reporting the same bug that was closed without a fix: #34029
Another edge case bug this PR fixes is handling an image that has an impractical aspect ratio instead of having PIL raise a
ValueError: height and width must be > 0error. The fix here is to force the minimum resized image dimensions to be 1 or greater.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker
@qubvel