Fix temporal padding in Qwen2VLImageProcessor when the number of frames is not divisible by temporal_patch_size #38076

ritwickchaudhry · 2025-05-12T07:01:10Z

This PR fixes an issue in Qwen2VLImageProcessor where the current implementation does not correctly handle cases when the number of video frames is not divisible by temporal_patch_size.

Problem:

The existing logic repeats the last frame temporal_patch_size - 1 times. This works correctly when temporal_patch_size equals 2 but fails when the size is greater.

Solution:

The fix replaces:

repeats = np.repeat(patches[-1][np.newaxis], temporal_patch_size - 1, axis=0)
with:

repeats = np.repeat(patches[-1][np.newaxis], temporal_patch_size - (patches.shape[0] % temporal_patch_size), axis=0)

This ensures that the correct number of padding frames are added when the frame count is not divisible by the temporal_patch_size.

Additional Changes:

Added a unit test to verify the padding logic for edge cases where the number of frames is not divisible by the patch size.

Issue Reference:

Fixes #38003

github-actions · 2025-05-12T07:01:21Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

ritwickchaudhry · 2025-05-12T07:03:41Z

@zucchini-nlp Could you please review this PR?

…e not divisible by temporal_patch_size

zucchini-nlp

Thanks!

zucchini-nlp · 2025-05-12T09:06:56Z

tests/models/qwen2_vl/test_image_processing_qwen2_vl.py

+
+            # Check the shape after padding
+            expected_output_video_shape = (102900, 1176)  # Adjusted based on padding
+            self.assertEqual(tuple(encoded_video.shape), expected_output_video_shape)


ultra nit: asserting ListEqual can give more informative error output when tests fail :)

HuggingFaceDocBuilderDev · 2025-05-12T09:20:45Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…es is not divisible by temporal_patch_size (huggingface#38076) Qwen2VL: Fix temporal padding in Qwen2VLImageProcessor when frames are not divisible by temporal_patch_size

yaogang2060 · 2025-11-01T12:52:37Z

qwen3vl_video_processor has same problem....

transformers/src/transformers/models/qwen3_vl/video_processing_qwen3_vl.py

Line 236 in 76fc50a

repeats = patches[:, -1:].repeat(1, temporal_patch_size - 1, 1, 1, 1)

zucchini-nlp · 2025-11-05T14:58:26Z

@yaogang2060 can you submit a PR and tag me pls?

github-actions bot marked this pull request as draft May 12, 2025 07:01

ritwickchaudhry mentioned this pull request May 12, 2025

Potential bug in Qwen 2/2.5 VL Image Preprocessor #38003

Closed

ritwickchaudhry marked this pull request as ready for review May 12, 2025 07:07

github-actions bot requested review from qubvel and ydshieh May 12, 2025 07:07

Qwen2VL: Fix temporal padding in Qwen2VLImageProcessor when frames ar…

9bd3e16

…e not divisible by temporal_patch_size

ritwickchaudhry force-pushed the fix-qwen2vl-temporal-padding branch from f876f69 to 9bd3e16 Compare May 12, 2025 07:12

zucchini-nlp approved these changes May 12, 2025

View reviewed changes

zucchini-nlp merged commit fe918d1 into huggingface:main May 14, 2025
11 checks passed

zucchini-nlp mentioned this pull request May 14, 2025

a logic error in _preprocess function of Qwen2VLImageProcessor Class #37064

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix temporal padding in Qwen2VLImageProcessor when the number of frames is not divisible by temporal_patch_size #38076

Fix temporal padding in Qwen2VLImageProcessor when the number of frames is not divisible by temporal_patch_size #38076

Uh oh!

ritwickchaudhry commented May 12, 2025

Uh oh!

github-actions bot commented May 12, 2025

Uh oh!

ritwickchaudhry commented May 12, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

zucchini-nlp May 12, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 12, 2025

Uh oh!

Uh oh!

yaogang2060 commented Nov 1, 2025

Uh oh!

zucchini-nlp commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix temporal padding in Qwen2VLImageProcessor when the number of frames is not divisible by temporal_patch_size #38076

Fix temporal padding in Qwen2VLImageProcessor when the number of frames is not divisible by temporal_patch_size #38076

Uh oh!

Conversation

ritwickchaudhry commented May 12, 2025

Problem:

Solution:

Additional Changes:

Issue Reference:

Uh oh!

github-actions bot commented May 12, 2025

Uh oh!

ritwickchaudhry commented May 12, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp May 12, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented May 12, 2025

Uh oh!

Uh oh!

yaogang2060 commented Nov 1, 2025

Uh oh!

zucchini-nlp commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants