cuda : fix supports_op condition for get_rows when number of blocks is too large #15868

ggerganov · 2025-09-08T07:15:06Z

Mark this case as unsupported until actual support is implemented.

ggml-ci

JohannesGaessler · 2025-09-08T07:31:56Z

The value ne12 is used in the CUDA code, but I think the indices are being calculated incorrectly. In the CPU code:

const int64_t i12 = i03%ne12;
const int64_t i11 = i02%ne11;
const int64_t i10 = i;

In the CUDA code:

const int i10 = blockIdx.x;
const int i11 = blockIdx.z / ne12; // gridDim.z == ne11*ne12
const int i12 = blockIdx.z % ne12;

In the CUDA code the same values are used for i11/i01 and i12/i02.

JohannesGaessler

If it fixes an immediate issue it would still be fine to merge this for now. But please add a FIXME comment.

ggerganov · 2025-09-08T07:40:01Z

Ok, I didn't look in the implementation and assumed it was not implemented. So, will update the PR to fix implementation.

In the CUDA code the same values are used for i11/i01 and i12/i02.

The intention of the operator is that i10 queries rows from src0, hence it corresponds to i01. Respectively:

i10 -> i01
i11 -> i02
i12 -> i03

So I think the CPU implementation is correct. Looking into this.

ggml-ci

ggerganov · 2025-09-08T08:50:50Z

The CUDA implementation is correct. The problem is that in one of the new GET_ROWS tests, the number of blocks along the 3rd dimension of the kernel exceeds 65536:

llama.cpp/ggml/src/ggml-cuda/getrows.cu

Line 134 in 2aee620

const dim3 block_nums(ne10, MIN(block_num_y, MAX_GRIDDIM_Y), ne11*ne12);

Here ne11*n12 > 2^16 and it causes the kernel launch to fail.

For now, I updated the support_op condition to bail out in such cases. Will leave it to you to add proper support for larger sizes.

…s too large (ggml-org#15868) * cuda : fix supports_op condition for get_rows when src1->ne2 > 1 ggml-ci * ggml : add comment about ggml_get_rows ggml-ci * cuda : add FIXME [no ci] * cuda : update support condition ggml-ci

cuda : fix supports_op condition for get_rows when src1->ne2 > 1

4c1c270

ggml-ci

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Sep 8, 2025

ggerganov requested a review from JohannesGaessler September 8, 2025 07:16

CISC mentioned this pull request Sep 8, 2025

model : avoid ggml_cont_3d for fused QKV weights #15662

Merged

JohannesGaessler approved these changes Sep 8, 2025

View reviewed changes

ggerganov added 3 commits September 8, 2025 11:17

ggml : add comment about ggml_get_rows

c453e5e

ggml-ci

cuda : add FIXME [no ci]

8c2a5fa

cuda : update support condition

2aee620

ggml-ci

ggerganov changed the title ~~cuda : fix supports_op condition for get_rows when src1->ne2 > 1~~ cuda : fix supports_op condition for get_rows when number of blocks is too large Sep 8, 2025

ggerganov merged commit b0d5299 into master Sep 8, 2025
51 of 55 checks passed

ggerganov deleted the gg/cuda-fix-supports-get-rows branch September 8, 2025 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda : fix supports_op condition for get_rows when number of blocks is too large #15868

cuda : fix supports_op condition for get_rows when number of blocks is too large #15868

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

JohannesGaessler commented Sep 8, 2025 •

edited

Loading

Uh oh!

JohannesGaessler left a comment •

edited

Loading

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

Uh oh!

Uh oh!

cuda : fix supports_op condition for get_rows when number of blocks is too large #15868

cuda : fix supports_op condition for get_rows when number of blocks is too large #15868

Uh oh!

Conversation

ggerganov commented Sep 8, 2025

Uh oh!

JohannesGaessler commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

Uh oh!

Uh oh!

JohannesGaessler commented Sep 8, 2025 •

edited

Loading

JohannesGaessler left a comment •

edited

Loading