Skip to content

Eval bug: Using llama-llava-clip-quantize-cli under CUDA backend conditions will encounter a crash. #12564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Ivy233 opened this issue Mar 25, 2025 · 1 comment

Comments

@Ivy233
Copy link
Contributor

Ivy233 commented Mar 25, 2025

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA vGPU-32GB, compute capability 8.9, VMM: yes
version: 4954 (3cd3a39)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

4080S 32G

Models

No response

Problem description & steps to reproduce

If you use CUDA to compile llama.cpp, name will encounter a crash when using llama-llava-clip-quantize-cli to quantize the vision part of the clip. After checking, the error area is found in the figure below.
This is most likely an error caused by the inability to access memory in the GPU backend. It needs to be compiled into a CPU backend version before it can be executed. Have you encountered this problem?

./build/bin/llama-llava-clip-quantize-cli ~/autodl-tmp/llava-v1.5-7b/mmproj-model-f16.gguf ~/autodl-tmp/llava-v1.5-7b/mmproj-model-Q4_0.gguf 2

First Bad Commit

No response

Relevant log output

(llamacpp) root@autodl-container-1a0b499d52-72782394:~/llama.cpp# ./build/bin/llama-llava-clip-quantize-cli ~/autodl-tmp/llava-v1.5-7b/mmproj-model-f16.gguf ~/autodl-tmp/llava-v1.5-7b/mmproj-model-Q4_0.gguf 2
clip_init: model name:   BGE-VL-large
clip_init: description:  image encoder for LLaVA
clip_init: GGUF version: 3
clip_init: alignment:    32
clip_init: n_tensors:    377
clip_init: n_kv:         19
clip_init: ftype:        f16

clip_init: loaded meta data with 19 key-value pairs and 377 tensors from /root/autodl-tmp/llava-v1.5-7b/mmproj-model-f16.gguf
clip_init: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_init: - kv   0:                       general.architecture str              = clip
clip_init: - kv   1:                      clip.has_text_encoder bool             = false
clip_init: - kv   2:                    clip.has_vision_encoder bool             = true
clip_init: - kv   3:                   clip.has_llava_projector bool             = true
clip_init: - kv   4:                          general.file_type u32              = 1
clip_init: - kv   5:                               general.name str              = BGE-VL-large
clip_init: - kv   6:                        general.description str              = image encoder for LLaVA
clip_init: - kv   7:                        clip.projector_type str              = mlp
clip_init: - kv   8:                     clip.vision.image_size u32              = 224
clip_init: - kv   9:                     clip.vision.patch_size u32              = 14
clip_init: - kv  10:               clip.vision.embedding_length u32              = 1024
clip_init: - kv  11:            clip.vision.feed_forward_length u32              = 4096
clip_init: - kv  12:                 clip.vision.projection_dim u32              = 768
clip_init: - kv  13:           clip.vision.attention.head_count u32              = 16
clip_init: - kv  14:   clip.vision.attention.layer_norm_epsilon f32              = 0.000010
clip_init: - kv  15:                    clip.vision.block_count u32              = 23
clip_init: - kv  16:                     clip.vision.image_mean arr[f32,3]       = [0.481455, 0.457828, 0.408211]
clip_init: - kv  17:                      clip.vision.image_std arr[f32,3]       = [0.268630, 0.261303, 0.275777]
clip_init: - kv  18:                              clip.use_gelu bool             = false
clip_init: - type  f32:  235 tensors
clip_init: - type  f16:  142 tensors
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA vGPU-32GB, compute capability 8.9, VMM: yes
clip_ctx: CLIP using CUDA0 backend
key clip.use_silu not found in file
clip_init: text_encoder:   0
clip_init: vision_encoder: 1
clip_init: llava_projector:  1
clip_init: minicpmv_projector:  0
clip_init: minicpmv_version:  2
clip_init: glm_projector:  0
clip_init: model size:     594.86 MB
clip_init: metadata size:  0.13 MB
clip_init: params backend buffer size =  594.86 MB (377 tensors)
key clip.vision.image_grid_pinpoints not found in file
key clip.vision.feature_layer not found in file
key clip.vision.mm_patch_merge_type not found in file
key clip.vision.image_crop_resolution not found in file

clip_init: vision model hparams
image_size         224
patch_size         14
v_hidden_size      1024
v_n_intermediate   4096
v_projection_dim   768
v_n_head           16
v_n_layer          23
v_eps              0.000010
v_image_mean       0.481455 0.457828 0.408211
v_image_std        0.268630 0.261303 0.275777
v_image_grid_pinpoints: 
v_vision_feature_layer: 
v_mm_patch_merge_type: flat
clip_init:      CUDA0 compute buffer size =     9.63 MiB
clip_init:        CPU compute buffer size =     1.58 MiB
Segmentation fault (core dumped)
Copy link
Contributor

github-actions bot commented May 9, 2025

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant