CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ #12098

JohannesGaessler · 2025-02-27T17:36:27Z

The logic for the combination of V100s and GGML_CUDA_FORCE_MMQ seems to be wrong on master. By default, when compiling without GGML_CUDA_FORCE_MMQ, the MMQ kernels should only be compiled for batch sizes up to MMQ_DP4A_MAX_BATCH_SIZE if FP16 tensor core hardware is available but int8 tensor core hardware is not (basically only V100s). Template specializations for higher batch sizes will never be used. However, the condition for this seems to have been inverted. Without GGML_CUDA_FORCE_MMQ unneeded template specializations were being compiled and with it the host code could attempt to run nonexistent kernels.

LostRuins · 2025-02-28T08:22:32Z

thanks for the help.

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ

96b344c

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Feb 27, 2025

JohannesGaessler mentioned this pull request Feb 27, 2025

CUDA Kernel Compatibility Error with Tesla V100 (Volta, sm_70) GPUs LostRuins/koboldcpp#1390

Closed

slaren approved these changes Feb 27, 2025

View reviewed changes

JohannesGaessler merged commit 9c42b17 into ggml-org:master Feb 28, 2025
43 checks passed

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (ggml-org#12098)

d9e7eb2

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (ggml-org#12098)

eab6c76

mostlyuseful pushed a commit to mostlyuseful/llama.cpp that referenced this pull request May 12, 2025

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (ggml-org#12098)

ed60117

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ #12098

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ #12098

Uh oh!

JohannesGaessler commented Feb 27, 2025

Uh oh!

LostRuins commented Feb 28, 2025

Uh oh!

Uh oh!

Uh oh!

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ #12098

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ #12098

Uh oh!

Conversation

JohannesGaessler commented Feb 27, 2025

Uh oh!

LostRuins commented Feb 28, 2025

Uh oh!

Uh oh!

Uh oh!