Skip to content

Conversation

@yonigozlan
Copy link
Member

What does this PR do?

Adds uniformized processors following #31911 for LlavaNextVideoProcessor .

Fixes #35602

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late review, forgot about this PR.

Cool that we're standardizing video LLMs. Overall LGTM, we just need a few tests with video processors to make sure nothing breaks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny concern about squishing in audio before the videos, but I don't think any user passes videos as positional arg so maybe we are oke

I don't want to add more complexity by trying to validate the order of audio and video now, so let's leave as is and just take this noted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Comment on lines +348 to +350
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am I right that we don't need legacy=False anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up keeping it because I had put v5.0.0 as the deprecation version

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops thanks for catching that :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik ProcessorTesterMixin doesn't test videos_kwargs yet. I think we need to add video tests to make sure that llava-next-video processor works as expected

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! I added tests for video_kwargs, very similar to those on images_kwargs :)

@yonigozlan yonigozlan force-pushed the uniformize-llava-next-video-processor branch from a75909c to fd69d5c Compare January 14, 2025 19:54
@yonigozlan
Copy link
Member Author

Thanks for the feedback @zucchini-nlp !
I also just noticed that there is an issue with LlavaOneVision processor test, more specifically test_chat_template_dict seems to fail, with this error:

src/transformers/models/llava_onevision/processing_llava_onevision.py:167: in __call__
    one_video = to_numpy_array(video_inputs.get("pixel_values_videos")[0])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def to_numpy_array(img) -> np.ndarray:
        if not is_valid_image(img):
>           raise ValueError(f"Invalid image type: {type(img)}")
E           ValueError: Invalid image type: <class 'list'>

src/transformers/image_utils.py:231: ValueError

This is the case in main as well so seems unrelated to this PR.

@zucchini-nlp
Copy link
Member

Yep, will be fixed by #35660

@qubvel qubvel removed their request for review January 20, 2025 18:23
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool thanks!

@yonigozlan
Copy link
Member Author

I had to make some modifications to llava_next_video processor tests following this PR #35953 .
@zucchini-nlp Could you confirm these modifications are fine? I had to add some return_tensors=None in the chat_template tests to have consistent output types when comparing.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, LGTM, thanks!

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well - for the processing common tests for videos there's a couple models that were recently/will be soon merged, would be cool to check if they work !

Comment on lines 40 to 41
"image_kwargs": {},
"videos_kwargs": {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure you have to specify an empty dictionary here!

@yonigozlan yonigozlan merged commit 9b479a2 into huggingface:main Feb 18, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LlavaNextVideoProcessor -> TypeError: LlavaNextVideoProcessor.__call__() got an unexpected keyword argument 'legacy' (I have the fix)

5 participants