Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash. #12564

Ivy233 · 2025-03-25T09:42:41Z

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA vGPU-32GB, compute capability 8.9, VMM: yes
version: 4954 (3cd3a39)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

4080S 32G

Models

No response

Problem description & steps to reproduce

If you use CUDA to compile llama.cpp, name will encounter a crash when using llama-llava-clip-quantize-cli to quantize the vision part of the clip. After checking, the error area is found in the figure below.
This is most likely an error caused by the inability to access memory in the GPU backend. It needs to be compiled into a CPU backend version before it can be executed. Have you encountered this problem?

./build/bin/llama-llava-clip-quantize-cli ~/autodl-tmp/llava-v1.5-7b/mmproj-model-f16.gguf ~/autodl-tmp/llava-v1.5-7b/mmproj-model-Q4_0.gguf 2

First Bad Commit

No response

Relevant log output

(llamacpp) root@autodl-container-1a0b499d52-72782394:~/llama.cpp# ./build/bin/llama-llava-clip-quantize-cli ~/autodl-tmp/llava-v1.5-7b/mmproj-model-f16.gguf ~/autodl-tmp/llava-v1.5-7b/mmproj-model-Q4_0.gguf 2
clip_init: model name:   BGE-VL-large
clip_init: description:  image encoder for LLaVA
clip_init: GGUF version: 3
clip_init: alignment:    32
clip_init: n_tensors:    377
clip_init: n_kv:         19
clip_init: ftype:        f16

clip_init: loaded meta data with 19 key-value pairs and 377 tensors from /root/autodl-tmp/llava-v1.5-7b/mmproj-model-f16.gguf
clip_init: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_init: - kv   0:                       general.architecture str              = clip
clip_init: - kv   1:                      clip.has_text_encoder bool             = false
clip_init: - kv   2:                    clip.has_vision_encoder bool             = true
clip_init: - kv   3:                   clip.has_llava_projector bool             = true
clip_init: - kv   4:                          general.file_type u32              = 1
clip_init: - kv   5:                               general.name str              = BGE-VL-large
clip_init: - kv   6:                        general.description str              = image encoder for LLaVA
clip_init: - kv   7:                        clip.projector_type str              = mlp
clip_init: - kv   8:                     clip.vision.image_size u32              = 224
clip_init: - kv   9:                     clip.vision.patch_size u32              = 14
clip_init: - kv  10:               clip.vision.embedding_length u32              = 1024
clip_init: - kv  11:            clip.vision.feed_forward_length u32              = 4096
clip_init: - kv  12:                 clip.vision.projection_dim u32              = 768
clip_init: - kv  13:           clip.vision.attention.head_count u32              = 16
clip_init: - kv  14:   clip.vision.attention.layer_norm_epsilon f32              = 0.000010
clip_init: - kv  15:                    clip.vision.block_count u32              = 23
clip_init: - kv  16:                     clip.vision.image_mean arr[f32,3]       = [0.481455, 0.457828, 0.408211]
clip_init: - kv  17:                      clip.vision.image_std arr[f32,3]       = [0.268630, 0.261303, 0.275777]
clip_init: - kv  18:                              clip.use_gelu bool             = false
clip_init: - type  f32:  235 tensors
clip_init: - type  f16:  142 tensors
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA vGPU-32GB, compute capability 8.9, VMM: yes
clip_ctx: CLIP using CUDA0 backend
key clip.use_silu not found in file
clip_init: text_encoder:   0
clip_init: vision_encoder: 1
clip_init: llava_projector:  1
clip_init: minicpmv_projector:  0
clip_init: minicpmv_version:  2
clip_init: glm_projector:  0
clip_init: model size:     594.86 MB
clip_init: metadata size:  0.13 MB
clip_init: params backend buffer size =  594.86 MB (377 tensors)
key clip.vision.image_grid_pinpoints not found in file
key clip.vision.feature_layer not found in file
key clip.vision.mm_patch_merge_type not found in file
key clip.vision.image_crop_resolution not found in file

clip_init: vision model hparams
image_size         224
patch_size         14
v_hidden_size      1024
v_n_intermediate   4096
v_projection_dim   768
v_n_head           16
v_n_layer          23
v_eps              0.000010
v_image_mean       0.481455 0.457828 0.408211
v_image_std        0.268630 0.261303 0.275777
v_image_grid_pinpoints: 
v_vision_feature_layer: 
v_mm_patch_merge_type: flat
clip_init:      CUDA0 compute buffer size =     9.63 MiB
clip_init:        CPU compute buffer size =     1.58 MiB
Segmentation fault (core dumped)

The text was updated successfully, but these errors were encountered:

github-actions · 2025-05-09T01:07:49Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

Ivy233 added the bug-unconfirmed label Mar 25, 2025

Ivy233 mentioned this issue Mar 25, 2025

clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend #12566

Merged

github-actions bot added the stale label Apr 25, 2025

github-actions bot closed this as completed May 9, 2025

aubinkure mentioned this issue May 10, 2025

Added all CPU to Docker GPU images for 'token_embd.weight' compatibility #12749

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash. #12564

Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash. #12564

Ivy233 commented Mar 25, 2025

github-actions bot commented May 9, 2025

Uh oh!

Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash. #12564

Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash. #12564

Comments

Ivy233 commented Mar 25, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

github-actions bot commented May 9, 2025

Uh oh!