Skip to content

Conversation

@zRzRzRzRzRzRzR
Copy link
Contributor

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR commented May 28, 2025

  1. This PR aims to support the use of the GLM-4-0414 model for training video understanding and image understanding models GLM-4.1V
  2. This PR has completed the refactoring of the related modules. Due to the overlap of F definitions (torch and torchvision), image_processors and videos_processors have not been placed under modular management @zucchini-nlp review sugguest.
  3. This PR is for code review. @ArthurZucker

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I fixed the remaining parts, and confirmed that the model is still working as expected on real checkpoints - merging now! Thanks for the work!! 🤗🚀

@Cyrilvallez Cyrilvallez merged commit af98702 into huggingface:main Jun 25, 2025
18 checks passed
@ydshieh
Copy link
Collaborator

ydshieh commented Jun 26, 2025

@zRzRzRzRzRzRzR Thank you for adding this model. This model's tests is quite slow as you can see in the following list, and causes the job that has this model tests running in 12 minutes instead of other jobs (~4 minutes)

https://app.circleci.com/pipelines/github/huggingface/transformers/135728/workflows/913523da-80bc-4021-9e85-d5ab8653d204/jobs/1800021/timing

Would you be up to make is faster? Usually it means to tweak Glm4vVisionText2TextModelTester. It's good to check what the config created there and see if they are some values would cause the created model being big.

58.03s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_initialization
23.31s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_model_outputs_equivalence
20.84s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_attention_outputs
18.42s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_resize_tokens_embeddings
17.87s call     tests/models/bark/test_modeling_bark.py::BarkModelIntegrationTests::test_model_can_generate
16.71s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_can_use_safetensors
16.34s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_load_save_without_tied_weights
15.13s call     tests/models/clvp/test_modeling_clvp.py::ClvpModelForConditionalGenerationTest::test_batching_equivalence
14.97s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_attn_implementation_composite_models
12.92s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_save_load
12.83s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_hidden_states_output
12.66s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_feed_forward_chunking
12.60s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_can_init_all_missing_weights
10.00s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_eager_matches_sdpa_inference_03_fp16_pad_left_no_attn_mask
9.79s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_model_weights_reload_no_missing_tied_weights
9.54s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_eager_matches_sdpa_inference_18_bf16_pad_left_no_attn_mask_sdpa_kernels
9.12s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_eager_matches_sdpa_inference_04_fp16_pad_right_sdpa_kernels
9.03s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_eager_matches_sdpa_inference_00_fp16_pad_left_sdpa_kernels
9.01s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_batching_equivalence
8.94s call     tests/models/glm4v/test_modeling_glm4v.py::Glm4vModelTest::test_eager_matches_sdpa_inference_01_fp16_pad_left

shimizust pushed a commit to linkedin/Liger-Kernel that referenced this pull request Aug 19, 2025
## Summary
<!--- This is a required section; please describe the main purpose of
this proposed code change. --->
This PR adds support for GLM4.1V (GLM-4 Vision) models to the Liger
Kernel #854
https://huggingface.co/zai-org/GLM-4.1V-9B-Thinking
This model have been merged in
huggingface/transformers#38431
<!---
## Details
This is an optional section; is there anything specific that reviewers
should be aware of?
--->

## Testing Done
<!--- This is a required section; please describe how this change was
tested. --->

<!-- 
Replace BLANK with your device type. For example, A100-80G-PCIe

Complete the following tasks before sending your PR, and replace `[ ]`
with
`[x]` to indicate you have done them. 
-->

- Hardware Type: <BLANK>
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [x] run `make test-convergence` to ensure convergence

---------

Co-authored-by: Shao Tang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants