Skip to content

Conversation

@yonigozlan
Copy link
Member

@yonigozlan yonigozlan commented Feb 13, 2025

What does this PR do?

In slow processors, accepted init and call kwargs are not always the same for no apparent reason, which can make things confusing for users. (see vllm-project/vllm#13143 (comment))

DefaultFastImageProcessorInitKwargs and DefaultFastImageProcessorPreprocessKwargs were used to follow this, but with the way fast image processors handle kwargs, they can easily be merged in one DefaultFastImageProcessorKwargs, so that the init and call functions will accept the same kwargs.

Propagating this change to slow image processors will require a lot more diffs however. As discussed with @hmellor , should we try to make this change for slow image processors as well, or focus on encouraging the use of fast image processors @ArthurZucker ?

P.S. Qwen2_VL and Qwen2_5_VL are a bit more difficult to fix, so I'll open a separate PR for these

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member

Might be a bit early to review, but what do you think of using ImagesKwargs from processing utils? I tried with VideosKwargs in video processors, worked out pretty well

@yonigozlan
Copy link
Member Author

Might be a bit early to review, but what do you think of using ImagesKwargs from processing utils? I tried with VideosKwargs in video processors, worked out pretty well

Yes that would be nice, although the kwargs are not exactly the same right now, like size_divisor, do_pad, pad_size are not supported by the base fast image processor. It was something I was considering adding support for padding in the base fast image processor though, so it might work out in the end. Not sure how to handle common kwargs like return_tensors though, which is not explicitely included in processing_utils ImagesKwargs.

Will experiment with that and see if I breaks anything, but it might be too much scope for this PR? cc @ArthurZucker

@yonigozlan yonigozlan changed the title [WIP] Remove differences between init and preprocess kwargs for fast image processors Remove differences between init and preprocess kwargs for fast image processors Feb 14, 2025
@zucchini-nlp
Copy link
Member

@yonigozlan Yeah, some kwargs are not there. For do_pad arg it is used only in llava-next style image processors afair, and not used in video processing yet. Will be super nice to have all kwargs defined in one TypedDict, but doesn't have to be in this PR sure :)

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THanks, IMO let's focus on fast image processor first, making they are as efficient and fast + standardized as possible! 🤗

@yonigozlan
Copy link
Member Author

@yonigozlan Yeah, some kwargs are not there. For do_pad arg it is used only in llava-next style image processors afair, and not used in video processing yet. Will be super nice to have all kwargs defined in one TypedDict, but doesn't have to be in this PR sure :)

Definitely agree on defining all kwargs in one typed dict, it would also make things simpler and more coherent in processors, where the custom ImagesKwargs could just be imported from fast image processors/image processors, and so we'll have one source of truth, because I also noticed that a lot of processors should have custom ImagesKwargs but don't. Will address that in a another PR as well :)

THanks, IMO let's focus on fast image processor first, making they are as efficient and fast + standardized as possible! 🤗

Great! Then this should be ready for your review @ArthurZucker
Also this PR is extended by the one for Qwen2VL: #36207

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very much needed thanks!

model_input_names = ["pixel_values"]
valid_init_kwargs = DefaultFastImageProcessorInitKwargs
valid_preprocess_kwargs = DefaultFastImageProcessorPreprocessKwargs
valid_kwargs = DefaultFastImageProcessorKwargs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of having this here but not worries

@yonigozlan yonigozlan merged commit ea219ed into huggingface:main Mar 12, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants