Skip to content

Commit 02082f1

Browse files
authored
clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566)
* [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize. After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA. * [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.
1 parent df4d20c commit 02082f1

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

examples/llava/clip.cpp

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2989,7 +2989,10 @@ bool clip_model_quantize(const char * fname_inp, const char * fname_out, const i
29892989
assert(itype < GGML_TYPE_COUNT);
29902990
ggml_type type = static_cast<ggml_type>(itype);
29912991

2992-
auto * ctx_clip = clip_model_load(fname_inp, 2);
2992+
auto * ctx_clip = clip_init(fname_inp, clip_context_params{
2993+
/* use_gpu */ false,
2994+
/* verbosity */ 2,
2995+
});
29932996

29942997
const auto & ctx_src = ctx_clip->ctx_gguf;
29952998
const auto & ctx_data = ctx_clip->ctx_data;

0 commit comments

Comments
 (0)