-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12. #3726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/cc @cudawarped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am pretty sure it was introduced in CUDA 12.4 (12040).
Is it worth including an assert before it is used because BufferPool.getBuffer()
expects an int
? e.g.
CV_Assert(bufSize <= std::numeric_limits<int>::max());
What about the next call to get buffer size?
Have you tested this on 12.4? I think that it will still fail because of the other bug so I am not sure if it will pass any CI tests built against CUDA 12.4.
I checked the denfition of CUDA_VERSION in CUDA 12 which is the 12.XX.XX instad of a int number, so I edited the CmakeLists.txt to add a new denfination of CUDA_12_OR_HIGHER and fixed the compile of reductions.cpp, however It went other errors still need to slove
C:\opencv-bld\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp(379): error: no instance of overloaded function "cv::cudev::blockReduce" matches the argument list
argument types are: (cuda::std::__4::tuple<volatile int *, volatile int *>, cuda::std::__4::tuple<int &, int &>, int, cuda::std::__4::tuple<cv::cudev::minimum<int>, cv::cudev::maximum<int>>)
blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval), tie(mymin, mymax), tid, make_tuple(minOp, maxOp))
I think you are confusing CMake generation and compilation. Adding that definition which is still for the incorrect verison of CUDA into the CMake file is unecessary. Version Info C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\cuda.h
Function definitions
|
39073a2
to
34bd538
Compare
I deleted the edit of the CMake and find the correct CUDA_VERSION is 12040 in CUDA 12.4. I found a way to check the cuda.h without install it, just use 7-Zip to open the cuda_12.4.1_551.78_windows.exe and find this directory cuda_12.4.1_551.78_windows.exe\cuda_cudart\cudart\include\cuda.h and found the macro denfination
|
Whilst this is a completely valid way to check the header I would advise you to install CUDA 12.4 when submitting a PR which fixes something that it breaks. If you do that you will realize that
If I was authoring this PR I would install both CUDA 12.3 and 12.4 and check that this builds on both without errors. |
A slight API change of NPP nppiMeanStdDevGetBufferHostSize_8u_C1R The type of bufSize is size_t instead of int in CUDA 12.4.x
@opencv-alalek this resolves the issue mentioned but still results in build errors because |
I still have an issue #3728 |
@jiapei100 Your issue is related to |
great job! thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks reasonable. I'm looking on #3690 if we can do something on OpenCV side.
A slight API change of nppiMeanStdDevGetBufferHostSize_8u_C1R and nppiMeanStdDevGetBufferHostSize_32f_C1R in NPP of CUDA 12 has caused the #3725. I will try to fix this. I found that the type of bufSize is
size_t
instead ofint
in thereductions.cpp
in the NPP header file, where thenppi_statistics_functions.h
changed the type of second parameter from* int
to* size_t
.nppi_statistics_functions.h 5392:5408
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.