-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Feature Request: Qwen 2.5 VL #11483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm currently looking into Transformers' Qwen2.5VL implementation and waiting for the paper to drop so I can better assess the differences between Qwen2VL and Qwen2.5VL. 👀 |
cool |
I support this! |
Our world definitely needs this! |
Any progress on this? Who added support for Qwen 2 VL? |
qwen2.5-vl report is up! https://huggingface.co/papers/2502.13923 edit: official codebase here: https://github.com/QwenLM/Qwen2.5-VL |
I can start working on this if no one else is already. |
OK then! First order of business would be to build the GGUF file(s). Seems there is an issue with that and the latest official Transformers:
This is pretty hot: Appears a temporary workaround would be to use the old Qwen2 templates. People are reporting this works, so I'll post an update in a bit. |
Right, so this one is a bit of a rabbit hole... I. Reverting the Qwen2.5 config files to:
and
Produces a (seemingly) working model! We've started testing and quantizing it here: II. In order to get a usable experience, you need to make sure CLIP is running with hardware acceleration. This currently requires you to revert this commit: For more information refer to: The following PR seems to correct (at least) some of the issues that led to disabling hardware acceleration in the first place: So, it is now up to us to prove that everything is working properly. I'll start a stress / perf eval test alongside the quantization process, so we have a better idea about what's going on. |
UPDATE: A few 4-bit quants have been uploaded, including two that support online auto-repacking. The latest main looks stable with Vulkan CLIP and any model thrown at it so far. Some preliminary insights:
Output quality looks very promising! We'll release all of the benchmark code when ready, so the process can be streamlined for other models. |
Hi! Excelent news, thank you very much for this! I was able to run the model by using code from git main on a 4 x Radeon 7900 XTX 24 GB workstation, but using Clip on CPU. I tried to enable Vulkan acceleration for Clip by uncommenting the lines on clip.cpp under examples, but in that case I get OOM. I tried this with models FP16, Q4K_M and IQ4_XS. Specifying the cli to just use one Vulkan device does not help on the OOM / Clip GPU issue either. |
Hi, could you please confirm what the resolution of your input images is? EDIT: As per Qwen2.5 docs: A RTFM moment for me... |
Thanks. My image was 1475x1062. I was able to run inference successfuly using a 1077x671 sample, without OOM. Would it be possible to run Clip and VL on separate GPUs? Thanks again. |
Thank you very much for your research and sharing! I would like to ask how to get mmproj from Qwen2.5-VL model? The original qwen2_vl_surgery.py used for Qwen2-VL doesn't seem to work, could you share your method? Thank you very much! |
Get it from our HF: |
Thank you for the effort, a lot of people really need this. Any updates on the progress? Will this still take a few days? or is it more like a few weeks or months? Thanks a lot again, we appreciate you guys a lot!. |
@vladislavdonchev Great work! Have you done the 3B version? I can also do it myself if you provide the conversion script :) |
Working on it as we speak, along with a quantization tool: |
UPDATE: Opened a draft PR here: #12119 Long story short, I'll need some help debugging the vision models and llama-qwen2vl-cli as we're unable to produce anything reliably. In addition, this still isn't resolved: I've also asked the Qwen folks for help: |
Thanks @vladislavdonchev for the effort and the update. I took a look at the issue you opened with the qwen team, is it only affecting the 3B model? Can we expect at least progress to continue with 7b? Thank you! |
Unfortunately, we're unable to reliably produce a working vision model from either 7B or 3B. I am not sure how the one in the repo was exported, but it seems to be working, so it's either some weird coincidence or a mistake. I've verified the LM part, including in quants and it also appears to match what you'd expect from Qwen2.5 (parameters in .gguf seem correct, responses are OK). |
I am getting the following error while trying to use Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf on Apple Silicon:
Could somebody please help out? |
Did you figure this out? |
Nope |
Please stop spamming this thread. Qwen2.5 is still a WIP! Regarding the issue above: Please wait until the implementation has been finalized. |
work great with Green-s Qwen2.5-VL-7B-Instruct-Q4_K_M.gguf and HimariO 's llama.cpp. qwen25-vl branches |
They just dropped 32B VL version. |
I just ran a simple test using the 32B variant of the model with a smaller sample image (500x300 pixels, to be specific). It still took around 20 minutes to generate a single caption on my setup with CPU backend, but the result looked pretty decent. I've uploaded the GGUF files to the Hugging Face Hub so that others with better hardware can give it a try. llama-qwen2vl-cli -m qwen25-vl-32b-instruct-vision-00001-of-00002.gguf --mmproj qwen-qwen2.5-vl-32b-instruct-vision.gguf -p "Describe this image." --image demo_small.jpg --threads 24 |
@HimariO The |
Also, consider supporting Qwen2.5-Omni? |
What is the right way of running a btach of 4 images in batch? When I include several --image arguments it just seems to run it sequentially. |
@vladislavdonchev when you say batching, it does not really batch right? It seems to load the model and run inference for each image sequentially, right? Am I missing something? |
@green-s according to my testing, 7B and 32B models' vision encoder gguf are working fine with |
Ah sorry I only tried running it on Windows. Tried it on Linux and it worked fine. |
please add Qwen 2.5 omni support. |
when will llama.cpp officially support Qwen 2.5 VL |
@HimariO Will existing conversions/quants work after those changes or do they have to be redone? |
i got error with this build: https://github.com/HimariO/llama.cpp.qwen2.5vl/releases/tag/b5043 command:
outout:
|
@green-s Previously converted gguf should work fine with recent changes; nothing has changed in the vision encoder gguf format. @soldivelot Those "errors" are normal, since they are raised by non-essential parameters that Qwen2.5VL doesn't use. |
How are you guys using Qwen VL models to have a conversation about images? I am only finding the llama.cpp binaries to provide zero-shot prompting and not a true conversation or OAI endpoint that I can use with Open-WebUI to incorporate images in my text conversations. Appreciate the insight! |
@HimariO Then how to quantize the vision encoder of 3B variant? I also failed to quantize the vision encoder fo 7B and 32B. May I know which version of the code you are using? |
Koboldcpp is the only way I know of at this moment if you want to do it locally. |
Is there any plan to merge code from whria78/llama-qwen-vl |
@ColumbusAI You can take a look at llama-box if you need pure API server or GPUStack if you need UI, clustering, distributed inference and more. |
You can try https://github.com/HimariO/llama.cpp.qwen2.5vl/tree/qwen25-vl-20250404 ./llama-qwen2vl-cli https://huggingface.co/Mungert/Qwen2.5-VL-7B-Instruct-GGUF Old deprecated binary. Developers must be streamlining all this into llama-mtmd-cli... |
no pull request? |
I just cloned the master of this repo and it looks to me like Qwen 2.5 VL is working fine out of the box. I just ran it via All I had to do is run It flawlessly downloaded the q4 and f16 version from the It's also all documented here: https://github.com/ggml-org/llama.cpp/tree/master/tools/mtmd I can load and ask questions about images no problem. Not sure this is what OP wanted but it looks to me like this can be closed. |
can you check Qwen 2.5 omni |
I can confirm, now the main here work !!! the fork I found is obsolete.
|
I don't think it will work. There are no gguf files for omni yet in the ggml.org space at huggingface (see https://huggingface.co/models?sort=trending&search=ggml-org+qwen2.5) and it's also not in the llama.cpp docs, so I would assume that it isn't supported yet. |
It's still a work in progress because for now qwen 2.5 VL (i tryed 3B and 7B) is not as good as qwen2 VL or gemma 3 |
I'm not sure if any of you all found extremely high perplexity values for Qwen 2.5 VL 72B Instruct? I'm getting 20 to 70 weirdly after BF16 conversion |
Uh oh!
There was an error while loading. Please reload this page.
Prerequisites
Feature Description
Is anybody implementing this?
If not, I may give it a go. But it will take some time as I am new to the source side of llama.cpp/ggml.
Motivation
Well, it's not currently working. :-)
Possible Implementation
Based on the existing Qwen 2 VL implementation.
The text was updated successfully, but these errors were encountered: