-
Notifications
You must be signed in to change notification settings - Fork 13k
cuda : fix supports_op condition for get_rows when number of blocks is too large #15868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The value const int64_t i12 = i03%ne12;
const int64_t i11 = i02%ne11;
const int64_t i10 = i; In the CUDA code: const int i10 = blockIdx.x;
const int i11 = blockIdx.z / ne12; // gridDim.z == ne11*ne12
const int i12 = blockIdx.z % ne12; In the CUDA code the same values are used for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it fixes an immediate issue it would still be fine to merge this for now. But please add a FIXME
comment.
Ok, I didn't look in the implementation and assumed it was not implemented. So, will update the PR to fix implementation.
The intention of the operator is that i10 queries rows from
So I think the CPU implementation is correct. Looking into this. |
The CUDA implementation is correct. The problem is that in one of the new GET_ROWS tests, the number of blocks along the 3rd dimension of the kernel exceeds 65536: llama.cpp/ggml/src/ggml-cuda/getrows.cu Line 134 in 2aee620
Here For now, I updated the |
…s too large (ggml-org#15868) * cuda : fix supports_op condition for get_rows when src1->ne2 > 1 ggml-ci * ggml : add comment about ggml_get_rows ggml-ci * cuda : add FIXME [no ci] * cuda : update support condition ggml-ci
cont #15687
Mark this case as unsupported until actual support is implemented.