llama.cpp adds vision support https://github.com/ggml-org/llama.cpp/blob/master/docs/multimodal.md 