-
Notifications
You must be signed in to change notification settings - Fork 29.7k
fix Glm4v batch videos forward #39172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Failed for changing the |
total_frames = video_grid_thw[0][0].item() | ||
h = video_grid_thw[0][1].item() | ||
w = video_grid_thw[0][2].item() | ||
video_grid_thw = [[1, h, w] for _ in range(total_frames)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also would need to pad timestamps
as otherwise it will fail when different number of frames are sampled per video. We've been discussing it internally with @zRzRzRzRzRzRzR , not sure though if he has any PR yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, timestamps
is not good to return here, can we return it like qwen2_5vl does ?
transformers/src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py
Lines 158 to 166 in df12d87
if isinstance(fps, (int, float)): | |
second_per_grid_ts = [self.video_processor.temporal_patch_size / fps] * len(video_grid_thw) | |
elif hasattr(fps, "__len__") and len(fps) == len(video_grid_thw): | |
second_per_grid_ts = [self.video_processor.temporal_patch_size / tmp for tmp in fps] | |
else: | |
raise ValueError( | |
f"The length of fps ({len(fps) if hasattr(fps, '__len__') else fps}) must be equal to the length of video_grid_thw ({len(video_grid_thw)}) or fps should be a single number." | |
) | |
videos_inputs.update({"second_per_grid_ts": second_per_grid_ts}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, not sure if this is equivalent to what GLM4V does because in GLM we want to add timestamps per frame in the prompt. We talked with this internally and decided that padding/unpadding can work, as the timestamps are used in internal processing only. So we can pad on the right, and strip off pad values in processing.py
😀Do I need to write more unit tests for this change? |
we can add integration test with batched/unbatched video inference, similar to transformers/tests/models/glm4v/test_modeling_glm4v.py Lines 325 to 326 in b0a8e0b
|
Thanks, I have tested this |
Yeeah, the checkpoints were updated in #39247. Can you rebase? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple tiny comments
Failed tests following now after run
|
run-slow: glm4v |
This comment contains run-slow, running the specified jobs: models: ['models/glm4v'] |
Yes, I think it fails due to the changes in this global var. After I ran |
cc @ydshieh , do you know anything about test files sharing the same |
Not really a regression, but maybe can have in patch |
Hi, may I have a bit of context here 🙏 . I guess you are talking about
But at this moment, I don't know what the issues you faced. A bit more details would be very appreciated. Thank you |
Oh yeah, sorry @ydshieh . The issue is that a test failed in slow CI when we ran it last time. The test is
|
Thanks. I will take a look today. |
@zucchini-nlp Those are class attributes, so if an instance change the values, it will affect the other places! Do you know why we use class attributes 😭
|
This is copied from image processors, because video inherits from it. I believe it was done to have defaults for kwargs, and keep core processing code in one place. @yonigozlan can say more on that I didn't see that the new test changes processor attribute, in the meanwhile we can fix and merge the PR. We can move discussion later to internal slack if needed :) |
run-slow: glm4v |
This comment contains run-slow, running the specified jobs: models: ['models/glm4v'] |
Hmm, failed again
|
Oh wait, i didn't check the changes, It's because we're still re-assigning the value of |
Sorry about that, I finally realized Yih-Dar's words. I'll fix that dangerous code. 😭 |
[For maintainers] Suggested jobs to run (before merge) run-slow: glm4v |
run-slow: glm4v |
This comment contains run-slow, running the specified jobs: models: ['models/glm4v'] |
Just want to point out that if an attribute makes sense to be as an instance attribute (i.e. we could have instances that are allowed to have different values for that attribute, then make it as class attribute is not a great idea and will sometimes cause chaos. I remember my nightmare when debugging Might be a good time to rethink of this @yonigozlan |
Yaay, test passing, can merge now 🎉 |
And now on this topic, I had some local branch where I am trying to use |
* changes for video * update modular * change get_video_features * update video token replacement * update modular * add test and fix typo * lint * fix order * lint * fix * remove dependency * lint * lint * remove todo * resize video for test * lint.. * fix test * new a processor for video_test * fix test
* changes for video * update modular * change get_video_features * update video token replacement * update modular * add test and fix typo * lint * fix order * lint * fix * remove dependency * lint * lint * remove todo * resize video for test * lint.. * fix test * new a processor for video_test * fix test
What does this PR do?
Fixes the issues of video_processing and get_video_features for GLM4V.
Have tested with following scripts
For forward logits checking, @zRzRzRzRzRzRzR
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@zucchini-nlp cc @zRzRzRzRzRzRzR