CUDA Toolkit 12.4.0 `tuple` incompatibility #3690

runer112 · 2024-03-08T18:41:53Z

System information (version)

OpenCV => 4.9.0
Operating System / Platform => Windows 64 Bit
Compiler => Visual Studio 2022

Detailed description

opencv with CUDA support cannot be built using CUDA Toolkit 12.4.0.

While CUDA Toolkit 12.3.2 uses thrust version 2.2.0 (https://docs.nvidia.com/cuda/archive/12.3.2/cuda-toolkit-release-notes/index.html), CUDA Toolkit 12.4.0 updates to thrust version 2.3.1 (https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html). In thrust version 2.3.0, the tuple implementation was replaced with a standard tuple implementaton (NVIDIA/cccl#262). Notably, this changes the definition from a 10-parameter template to a variable-parameter template. So instead of a tuple of n items being padded out with 10 - n null types to always have 10 template parameters, it now only has n template parameters. This makes the function templates in cudev specified with 10 template parameters per tuple no longer viable for tuples not of size 10.

An example of one such function template that's no longer viable, cv::cudev::blockReduce:

opencv_contrib/modules/cudev/include/opencv2/cudev/block/reduce.hpp

Lines 68 to 81 in 6b5142f

    
           template <int N, 
        
                     typename P0, typename P1, typename P2, typename P3, typename P4, typename P5, typename P6, typename P7, typename P8, typename P9, 
        
                     typename R0, typename R1, typename R2, typename R3, typename R4, typename R5, typename R6, typename R7, typename R8, typename R9, 
        
                     class Op0, class Op1, class Op2, class Op3, class Op4, class Op5, class Op6, class Op7, class Op8, class Op9> 
        
           __device__ __forceinline__ void blockReduce(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem, 
        
                                                       const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>& val, 
        
                                                       uint tid, 
        
                                                       const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>& op) 
        
           { 
        
               block_reduce_detail::Dispatcher<N>::reductor::template reduce< 
        
                       const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>&, 
        
                       const tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9>&, 
        
                       const tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9>&>(smem, val, tid, op); 
        
           }

An example of an error I encounter:

[build] Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp(379): error : no instance of overloaded function "cv::cudev::blockReduce" matches the argument list [Z:\dev\1\opencv\out\build\user\modules\world\opencv_world.vcxproj]
[build]               argument types are: (cuda::std::__4::tuple<volatile int *, volatile int *>, cuda::std::__4::tuple<int &, int &>, int, cuda::std::__4::tuple<cv::cudev::minimum<int>, cv::cudev::maximum<int>>)
[build]                 blockReduce<BLOCK_SIZE>(smem_tuple(sminval, smaxval), tie(mymin, mymax), tid, make_tuple(minOp, maxOp));
[build]                 ^
[build]   Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/block/reduce.hpp(72): note #3327-D: candidate function template "cv::cudev::blockReduce<N,P0,P1,P2,P3,P4,P5,P6,P7,P8,P9,R0,R1,R2,R3,R4,R5,R6,R7,R8,R9,Op0,Op1,Op2,Op3,Op4,Op5,Op6,Op7,Op8,Op9>(const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9> &, const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<R0, R1, R2, R3, R4, R5, R6, R7, R8, R9> &, uint, const thrust::THRUST_200301_500_520_600_610_700_750_800_860_890_900_NS::tuple<Op0, Op1, Op2, Op3, Op4, Op5, Op6, Op7, Op8, Op9> &)" failed deduction
[build]     __declspec(__device__) __forceinline void blockReduce(const tuple<P0, P1, P2, P3, P4, P5, P6, P7, P8, P9>& smem,
[build]                                               ^
[build]   Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/block/reduce.hpp(63): note #3327-D: candidate function template "cv::cudev::blockReduce<N,T,Op>(volatile T *, T &, uint, const Op &)" failed deduction
[build]     __declspec(__device__) __forceinline void blockReduce(volatile T* smem, T& val, uint tid, const Op& op)
[build]                                               ^
[build]             detected during:
[build]               instantiation of "void cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, src_type, work_type>::reduceGrid<BLOCK_SIZE>(work_type *, int) [with src_type=uchar, work_type=int, BLOCK_SIZE=256]" at line 412
[build]               instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,BLOCK_SIZE,PATCH_X,PATCH_Y,SrcPtr,ResType,MaskPtr>(SrcPtr, ResType *, MaskPtr, int, int) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, int>, BLOCK_SIZE=256, PATCH_X=4, PATCH_Y=4, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 421
[build]               instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, int>, Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 460
[build]               instantiation of "void cv::cudev::grid_reduce_detail::minMaxVal<Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=int, MaskPtr=cv::cudev::WithOutMask]" at line 206 of Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
[build]               instantiation of "void cv::cudev::gridFindMinMaxVal_<Policy,SrcPtr,ResType>(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=int]" at line 349 of Z:\dev\1\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
[build]               instantiation of "void cv::cudev::gridFindMinMaxVal(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=int]" at line 68 of Z:\dev\1\opencv_contrib\modules\cudaarithm\src\cuda\minmax.cu
[build]               instantiation of "void <unnamed>::minMaxImpl<T,R>(const cv::cuda::GpuMat &, const cv::cuda::GpuMat &, cv::cuda::GpuMat &, cv::cuda::Stream &) [with T=uchar, R=int]" at line 92 of Z:\dev\1\opencv_contrib\modules\cudaarithm\src\cuda\minmax.cu

The first candidate but nonviable function template shown in the error message is the one linked above, which was viable and selected in previous CUDA Toolkit versions.

I think that all templates specifying 10 template parameters per tuple can be updated to work with the new tuple definition by replacing each set of 10 template parameters with a parameter pack. I think this should still be compatible with the old tuple definition, as well. For example, I think this would be a viable implementation of cv::cudev::blockReduce:

template <int N, typename... P, typename... R, class... Op>
__device__ __forceinline__ void blockReduce(const tuple<P...>& smem,
                                            const tuple<R...>& val,
                                            uint tid,
                                            const tuple<Op...>& op)
{
    block_reduce_detail::Dispatcher<N>::reductor::template reduce<
        const tuple<P...>&,
        const tuple<R...>&,
        const tuple<Op...>&>(smem, val, tid, op);
}

Steps to reproduce

Attempt to build cudev using CUDA Toolkit 12.4.0. I suspect that this error will be observed with any combination of OpenCV version, OS, platform, and compiler (that are modern enough to not encounter some other error first).

Issue submission checklist

I report the issue, it's not a question
I checked the problem with documentation, FAQ, open issues,
forum.opencv.org, Stack Overflow, etc and have not found any solution
I updated to the latest OpenCV version and the issue is still there
There is reproducer code and related data files: videos, images, onnx, etc

The text was updated successfully, but these errors were encountered:

sjuxax · 2024-03-12T23:15:24Z

Just to confirm your suspicion that this affects cross-platform builds, getting the same errors on Linux with GCC 13:

opencv_contrib-4.9.0/modules/cudev/include/opencv2/cudev/grid/detail/reduce_to_column.hpp(73): error: no instance of overloaded function "cv::cudev::blockReduce" matches the argument list                                                 
            argument types are: (cuda::std::__4::tuple<volatile float *, volatile float *>, cuda::std::__4::tuple<float &, float &>, const unsigned int, cuda::std::__4::tuple<cv::cudev::Sum<float>, cv::cudev::Sum<float>>)                                           
              blockReduce<BLOCK_SIZE>(smem_tuple(smem[0], smem[1]), tie(myVal.x, myVal.y), threadIdx.x, make_tuple(op, op));

^ one such error

juls007 · 2024-03-24T03:07:49Z

I have the same issue when building latest OpenCV 4 from source with Cuda 12.4,, cudnn 9 and gcc 13, on Fedora 39

General configuration for OpenCV 4.9.0-dev =====================================
--   Version control:               4.9.0-293-g912cf2a028
-- 
--   Extra modules:
--     Location (extra):            /home/coder/projects/opencv-src/opencv_contrib/modules
--     Version control (extra):     4.9.0-51-gab821068
-- 
--   Platform:
--     Timestamp:                   2024-03-29T14:50:46Z
--     Host:                        Linux 6.7.10-200.fc39.x86_64 x86_64
--     CMake:                       3.27.7
--     CMake generator:             Unix Makefiles
--     CMake build tool:            /usr/bin/gmake
--     Configuration:               RELEASE
-- 
--   CPU/HW features:
--     Baseline:                    SSE SSE2 SSE3
--       requested:                 SSE3
--     Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
--       requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
--       SSE4_1 (18 files):         + SSSE3 SSE4_1
--       SSE4_2 (2 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
--       FP16 (1 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
--       AVX (9 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
--       AVX2 (38 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
--       AVX512_SKX (8 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
-- 
--   C/C++:
--     Built as dynamic libs?:      YES
--     C++ standard:                11
--     C++ Compiler:                /usr/lib64/ccache/c++  (ver 13.2.1)
--     C++ flags (Release):         -fsigned-char -ffast-math -fno-finite-math-only -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
--     C++ flags (Debug):           -fsigned-char -ffast-math -fno-finite-math-only -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
--     C Compiler:                  /usr/lib64/ccache/cc
--     C flags (Release):           -fsigned-char -ffast-math -fno-finite-math-only -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
--     C flags (Debug):             -fsigned-char -ffast-math -fno-finite-math-only -W -Wall -Wreturn-type -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
--     Linker flags (Release):      -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a   -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined  
--     Linker flags (Debug):        -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a   -Wl,--gc-sections -Wl,--as-needed -Wl,--no-undefined  
--     ccache:                      YES
--     Precompiled headers:         NO
--     Extra dependencies:          m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/lib64
--     3rdparty dependencies:
-- 
--   OpenCV modules:
--     To be built:                 alphamat aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform java line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency sfm shape signal stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab viz wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
--     Disabled:                    cudacodec world
--     Disabled by dependency:      -
--     Unavailable:                 cannops cvv julia matlab ovis python2
--     Applications:                tests perf_tests apps
--     Documentation:               NO
--     Non-free algorithms:         YES
-- 
--   GUI:                           GTK3
--     GTK+:                        YES (ver 3.24.41)
--       GThread :                  YES (ver 2.78.3)
--       GtkGlExt:                  NO
--     VTK support:                 YES (ver 9.2.6)
-- 
--   Media I/O: 
--     ZLib:                        /lib64/libz.so (ver 1.2.13)
--     JPEG:                        /lib64/libjpeg.so (ver 62)
--     WEBP:                        /lib64/libwebp.so (ver encoder: 0x020f)
--     PNG:                         /lib64/libpng.so (ver 1.6.37)
--     TIFF:                        /lib64/libtiff.so (ver 42 / 4.4.0)
--     JPEG 2000:                   OpenJPEG (ver 2.5.0)
--     OpenEXR:                     OpenEXR::OpenEXR (ver 3.1.10)
--     HDR:                         YES
--     SUNRASTER:                   YES
--     PXM:                         YES
--     PFM:                         YES
-- 
--   Video I/O:
--     DC1394:                      YES (2.2.7)
--     FFMPEG:                      YES
--       avcodec:                   YES (60.31.102)
--       avformat:                  YES (60.16.100)
--       avutil:                    YES (58.29.100)
--       swscale:                   YES (7.5.100)
--       avresample:                NO
--     GStreamer:                   YES (1.22.9)
--     v4l/v4l2:                    YES (linux/videodev2.h)
-- 
--   Parallel framework:            TBB (ver 2020.3 interface 11103)
-- 
--   Trace:                         YES (with Intel ITT)
-- 
--   Other third-party libraries:
--     Intel IPP:                   2021.10.1 [2021.10.1]
--            at:                   /home/coder/projects/opencv-src/opencv/build/3rdparty/ippicv/ippicv_lnx/icv
--     Intel IPP IW:                sources (2021.10.1)
--               at:                /home/coder/projects/opencv-src/opencv/build/3rdparty/ippicv/ippicv_lnx/iw
--     VA:                          YES
--     Lapack:                      YES (/lib64/libopenblas.so)
--     Eigen:                       YES (ver 3.4.0)
--     Custom HAL:                  NO
--     Protobuf:                    build (3.19.1)
--     Flatbuffers:                 builtin/3rdparty (23.5.9)
-- 
--   NVIDIA CUDA:                   YES (ver 12.4, CUFFT CUBLAS FAST_MATH)
--     NVIDIA GPU arch:             75
--     NVIDIA PTX archs:            75
-- 
--   cuDNN:                         YES (ver 9.0.0)
-- 
--   OpenCL:                        YES (INTELVA)
--     Include path:                /home/coder/projects/opencv-src/opencv/3rdparty/include/opencl/1.2
--     Link libraries:              Dynamic load
-- 
--   Python 3:
--     Interpreter:                 /usr/bin/python3 (ver 3.12.2)
--     Libraries:                   /lib64/libpython3.12.so (ver 3.12.2)
--     Limited API:                 NO
--     numpy:                       /usr/lib64/python3.12/site-packages/numpy/core/include (ver 1.24.4)
--     install path:                lib/python3.12/site-packages/cv2/python-3.12
-- 
--   Python (for build):            /usr/bin/python3
-- 
--   Java:                          
--     ant:                         NO
--     Java:                        YES (ver 17.0.10)
--     JNI:                         /home/coder/.sdkman/candidates/java/current/include /home/coder/.sdkman/candidates/java/current/include/linux /home/coder/.sdkman/candidates/java/current/include
--     Java wrappers:               YES (JAVA)
--     Java tests:                  NO
-- 
--   Install to:                    /usr/local
-- -----------------------------------------------------------------

moodzunl · 2024-03-24T05:05:30Z

Having the same Issue when building latest OpenCV 4 from Source on Windows 11.

HellmannM · 2024-03-26T14:28:55Z

I agree, this should be fixable the way you describe it. However:
tuple_size would need to get replaced as well. Probably straight-forward... It is used inside templates, where the parameter types are not directly visible. Example (last line):

template <class SrcPtr, class DstPtrTuple, class OpTuple, class MaskPtr>
__global__ void transform_tuple(const SrcPtr src, DstPtrTuple dst, const OpTuple op, const MaskPtr mask, const int rows, const int cols)
{   
    const int x = blockIdx.x * blockDim.x + threadIdx.x;
    const int y = blockIdx.y * blockDim.y + threadIdx.y;

    if (x >= cols || y >= rows || !mask(y, x)) 
        return;

    typename PtrTraits<SrcPtr>::value_type srcVal = src(y, x); 

    Unroll<tuple_size<DstPtrTuple>::value>::transform(srcVal, dst, op, y, x); 
}

Here one instance is compiled with
DstPtrTuple=cv::cudev::ZipPtr<cuda::std::__4::tuple<cv::cudev::GlobPtr<float>, cv::cudev::GlobPtr<float>>>
How can the correct template parameters be restored? Not expanding to the correct nested template would give the wrong result. Best I could come up with was:

template <typename T>
struct tuple_size {};

template <typename... P>
struct tuple_size< tuple<P...> >
{
    static const int value = sizeof...(P);
};

template <template <typename S> typename T, typename... P>
struct tuple_size< T<tuple<P...>> >
{   
    static const int value = sizeof...(P);
};

This seems to work for the case mentioned above. I am not sure however, if this will give correct result in all cases. Maybe someone can give some feedback? Or any ideas how this could be solved more elegantly?

stefanboca · 2024-03-28T18:18:11Z

Alternatively, Thrust's tuple_size can be specialized for ZipPtr<tuple<...>>.
Something like:

// placed at the end of modules/cudev/include/opencv2/cudev/ptr2d/zip.hpp, in the global namespace

_LIBCUDACXX_BEGIN_NAMESPACE_STD

template<class Ptr0, class Ptr1>
struct tuple_size<cv::cudev::ZipPtr<tuple<Ptr0, Ptr1>>> : tuple_size<tuple<Ptr0, Ptr1>> {};

template<class Ptr0, class Ptr1, class Ptr2>
struct tuple_size<cv::cudev::ZipPtr<tuple<Ptr0, Ptr1, Ptr2>>> : tuple_size<tuple<Ptr0, Ptr1, Ptr2>> {};

template<class Ptr0, class Ptr1, class Ptr2, class Ptr3>
struct tuple_size<cv::cudev::ZipPtr<tuple<Ptr0, Ptr1, Ptr2, Ptr3>>> : tuple_size<tuple<Ptr0, Ptr1, Ptr2, Ptr3>> {};


template<class Ptr0, class Ptr1>
struct tuple_size<cv::cudev::ZipPtrSz<tuple<Ptr0, Ptr1>>> : tuple_size<tuple<Ptr0, Ptr1>> {};

template<class Ptr0, class Ptr1, class Ptr2>
struct tuple_size<cv::cudev::ZipPtrSz<tuple<Ptr0, Ptr1, Ptr2>>> : tuple_size<tuple<Ptr0, Ptr1, Ptr2>> {};

template<class Ptr0, class Ptr1, class Ptr2, class Ptr3>
struct tuple_size<cv::cudev::ZipPtrSz<tuple<Ptr0, Ptr1, Ptr2, Ptr3>>> : tuple_size<tuple<Ptr0, Ptr1, Ptr2, Ptr3>> {};


template<size_t N, class Ptr0, class Ptr1>
struct tuple_element<N, cv::cudev::ZipPtr<tuple<Ptr0, Ptr1>>> : tuple_element<N, tuple<Ptr0, Ptr1>> {};

template<size_t N, class Ptr0, class Ptr1, class Ptr2>
struct tuple_element<N, cv::cudev::ZipPtr<tuple<Ptr0, Ptr1, Ptr2>>> : tuple_element<N, tuple<Ptr0, Ptr1, Ptr2>> {};

template<size_t N, class Ptr0, class Ptr1, class Ptr2, class Ptr3>
struct tuple_element<N, cv::cudev::ZipPtr<tuple<Ptr0, Ptr1, Ptr2, Ptr3>>> : tuple_element<N, tuple<Ptr0, Ptr1, Ptr2, Ptr3>> {};

_LIBCUDACXX_END_NAMESPACE_STD

Thrust does this for backwards compatibility with the old style of tuples as well. It also appears that tuple_element needs to be fixed as well, so I've included that for completeness.

In addition to the parameter packing changes mentioned above, I've successfully compiled OpenCV using this method.

Also limit cuda interaction to ABI_X86_64. Bug: opencv/opencv_contrib#3690 Signed-off-by: Paul Zander <[email protected]>

Also limit cuda interaction to ABI_X86_64. Bug: opencv/opencv_contrib#3690 Signed-off-by: Paul Zander <[email protected]> Closes: #36020 Signed-off-by: Joonas Niilola <[email protected]>

miscco · 2024-04-15T06:29:37Z

I am on of the maintainers of the cccl libraries at NVIDIA.

We recently updated our old thrust::tuple implementation to be an alias for cuda::std::tuple. Unfortunately, when providing the necessary backfills for thrust::tuple_size to work with thrust::null_type someone (me) missed to add the final overload for a 10 element tuple. My apologies for the disruption this bug has caused here.

This has been fixed after this issue was raised here.

There are different potential ways of working around this issue in the near / mid term:

You could pull in latest cccl from github and use that instead of the version packaged with the CTK 12.4. CMake should be able to find ToT with find_package(CCCL)
You could provide a temporary workaround similar to the fix we employed
(mid term) You should remove all explicit specializations of tuple that rely on the old 10-param tuple and simply use the number of tuple elements you actually need, aka replace thrust::tuple<T1, T2, thrust::null_type,...> with thrust::tuple<T1, T2>

opperliu · 2024-04-24T02:56:08Z

I am on of the maintainers of the cccl libraries at NVIDIA.

We recently updated our old thrust::tuple implementation to be an alias for cuda::std::tuple. Unfortunately, when providing the necessary backfills for thrust::tuple_size to work with thrust::null_type someone (me) missed to add the final overload for a 10 element tuple. My apologies for the disruption this bug has caused here.

This has been fixed after this issue was raised here.

There are different potential ways of working around this issue in the near / mid term:

You could pull in latest cccl from github and use that instead of the version packaged with the CTK 12.4. CMake should be able to find ToT with find_package(CCCL)

You could provide a temporary workaround similar to the fix we employed

(mid term) You should remove all explicit specializations of tuple that rely on the old 10-param tuple and simply use the number of tuple elements you actually need, aka replace thrust::tuple<T1, T2, thrust::null_type,...> with thrust::tuple<T1, T2>

how to replace?pull and cmake? which the cmake parameters?
when i use cmake .. it turns out error : Could not find libcudacxx_LIT using the following names: lit

miscco · 2024-04-24T19:19:24Z

how to replace?pull and cmake? which the cmake parameters? when i use cmake .. it turns out error : Could not find libcudacxx_LIT using the following names: lit

You could use CPM like:

include(cmake/CPM.cmake)
CPMAddPackage("gh:NVIDIA/cccl#main")
target_link_libraries(PROJECT CCCL::CCCL)

jiapei100 · 2024-04-29T06:09:06Z

Well... Still NOT quite get it... Do we have the solution already???

Have cccl built and replaced with the default ones installed with CUDA-Toolkit 12.4??

Thanks

LiuToki · 2024-05-07T04:19:59Z

Well... Still NOT quite get it... Do we have the solution already???

Have cccl built and replaced with the default ones installed with CUDA-Toolkit 12.4??

Thanks

I was able to build the library using CUDA Toolkit 12.3.2 in my environment(through vcpkg). This is one way to use it.

Also, the above cccl fixes seem to be going into v2.4.0.
Even the latest version of the CUDA Toolkit at this time, 12.4.1, still seems to use an unfixed cccl.
Please look at the NVIDIA CUDA Toolkit Release Notes for 12.4 Update 1.

asmorkalov · 2024-05-22T08:31:50Z

CUDA Toolkit 12.5 still has the bug.

zachlowry · 2024-05-29T13:58:50Z

include(cmake/CPM.cmake)
CPMAddPackage("gh:NVIDIA/cccl#main")
target_link_libraries(PROJECT CCCL::CCCL)

Do you have any additional information as to how to use these commands correctly to patch the CMake files in OpenCV? I have tried adding them to the CMakeLists.txt without much success so far.

Added CUDA 12.4+ support #3744 Tries to fix #3690 for CUDA 12.4+ Related patch to main repo: opencv/opencv#25658 Changes: - Added branches to support new variadic implementation of thrust::tuple - Added branch with std::array instead of std::tuple in split-merge and grid operations. The new branch got rid of namespace clash: cv::cuda in OpenCV and ::cuda in CUDA standard library (injected by Thrust). Old tuple branches presumed for compatibility with old code and CUDA versions before 12.4. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

Cupcc · 2024-06-03T09:55:13Z

I have the same issue when building opencv 4.8.0 with CUDA 12.4.0 on Ubuntu 22.04 LST

cudawarped · 2024-06-03T10:21:11Z

@Cupcc either use CUDA < 12.4 or the latest commits from the 4.x branch which address this issue.

jiapei100 · 2024-06-10T05:41:56Z

CUDA Toolkit 12.5 still has the bug.

Hi, @asmorkalov . I just tried CUDA 12.5 with CUDNN 9.2.0, OpenCV-4.10.0 is built successfully. ^_^

heartlove · 2024-08-07T14:26:00Z

I agree, this should be fixable the way you describe it. However: tuple_size would need to get replaced as well. Probably straight-forward... It is used inside templates, where the parameter types are not directly visible. Example (last line):
template <class SrcPtr, class DstPtrTuple, class OpTuple, class MaskPtr>
__global__ void transform_tuple(const SrcPtr src, DstPtrTuple dst, const OpTuple op, const MaskPtr mask, const int rows, const int cols)
{   
    const int x = blockIdx.x * blockDim.x + threadIdx.x;
    const int y = blockIdx.y * blockDim.y + threadIdx.y;

    if (x >= cols || y >= rows || !mask(y, x)) 
        return;

    typename PtrTraits<SrcPtr>::value_type srcVal = src(y, x); 

    Unroll<tuple_size<DstPtrTuple>::value>::transform(srcVal, dst, op, y, x); 
}
Here one instance is compiled with DstPtrTuple=cv::cudev::ZipPtr<cuda::std::__4::tuple<cv::cudev::GlobPtr<float>, cv::cudev::GlobPtr<float>>> How can the correct template parameters be restored? Not expanding to the correct nested template would give the wrong result. Best I could come up with was:
template <typename T>
struct tuple_size {};

template <typename... P>
struct tuple_size< tuple<P...> >
{
    static const int value = sizeof...(P);
};

template <template <typename S> typename T, typename... P>
struct tuple_size< T<tuple<P...>> >
{   
    static const int value = sizeof...(P);
}; 
This seems to work for the case mentioned above. I am not sure however, if this will give correct result in all cases. Maybe someone can give some feedback? Or any ideas how this could be solved more elegantly?

I have the same issue when building openCV 5.x with CUDA 12.6.0，on Win11

cudawarped · 2024-08-07T14:33:13Z

I have the same issue when building openCV 5.x with CUDA 12.6.0，on Win11

See #3773

abysslover · 2024-10-25T01:18:35Z

blinghaim

how to replace?pull and cmake? which the cmake parameters? when i use cmake .. it turns out error : Could not find libcudacxx_LIT using the following names: lit

Unfortunately, @miscco 's answer does not actually answer your question, which also made confusion to public as it is still a persisting error. You should install lit python package by typing pip install lit.

opencv-alalek added bug category: build/install category: cuda labels Mar 9, 2024

cudawarped mentioned this issue Mar 13, 2024

OpenCV 4.8.0 build fails with CUDA 12.3, CUDNN 8.9.6.50 on Ubuntu 20.04, undefined references to libffi.so.7. opencv/opencv#24498

Closed

4 tasks

negril added a commit to negril/gentoo that referenced this issue Mar 31, 2024

media-libs/opencv: prohibit cuda-12.4

1882c5e

Also limit cuda interaction to ABI_X86_64. Bug: opencv/opencv_contrib#3690 Signed-off-by: Paul Zander <[email protected]>

negril mentioned this issue Mar 31, 2024

media-libs/opencv: prohibit cuda-12.4 gentoo/gentoo#36020

Closed

cudawarped mentioned this issue Apr 2, 2024

CUDA version check opencv/opencv#25303

Open

This was referenced Apr 22, 2024

Build with Error C2664 nppiMeanStdDevGetBufferHostSize_8u_C1R #3725

Closed

Try to Fix #3725: cudaarithm: fix the compile faiure of CUDA 12. #3726

Merged

cudawarped mentioned this issue Apr 28, 2024

error: no instance of overloaded function "cv::cudev::blockReduce" matches the argument list #3728

Closed

4 tasks

jiapei100 mentioned this issue May 1, 2024

DNN module fails to compile against cuDNN 9.0 opencv/opencv#24983

Closed

4 tasks

This was referenced May 24, 2024

Added CUDA 12.4+ support #3744

Merged

Added branch with variadic version of Trust tuple opencv/opencv#25658

Merged

asmorkalov closed this as completed in opencv/opencv#25658 May 30, 2024

asmorkalov self-assigned this Jun 3, 2024

BillyONeal mentioned this issue Jun 18, 2024

Patch Tuesday June 2024 microsoft/vcpkg#39325

Merged

yshui mentioned this issue Sep 1, 2024

Build failure: opencv with cuda support NixOS/nixpkgs#338315

Closed

HiroyukiSakai mentioned this issue Feb 20, 2025

Compilation trouble cg-tuwien/StatMC#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Toolkit 12.4.0 `tuple` incompatibility #3690

CUDA Toolkit 12.4.0 `tuple` incompatibility #3690

runer112 commented Mar 8, 2024 •

edited

Loading

sjuxax commented Mar 12, 2024

juls007 commented Mar 24, 2024 •

edited by asmorkalov

Loading

moodzunl commented Mar 24, 2024

HellmannM commented Mar 26, 2024 •

edited

Loading

stefanboca commented Mar 28, 2024 •

edited

Loading

miscco commented Apr 15, 2024

opperliu commented Apr 24, 2024

miscco commented Apr 24, 2024 •

edited

Loading

jiapei100 commented Apr 29, 2024

LiuToki commented May 7, 2024 •

edited

Loading

asmorkalov commented May 22, 2024

zachlowry commented May 29, 2024

Cupcc commented Jun 3, 2024

cudawarped commented Jun 3, 2024

jiapei100 commented Jun 10, 2024

heartlove commented Aug 7, 2024

cudawarped commented Aug 7, 2024

abysslover commented Oct 25, 2024 •

edited

Loading

CUDA Toolkit 12.4.0 tuple incompatibility #3690

CUDA Toolkit 12.4.0 tuple incompatibility #3690

Comments

runer112 commented Mar 8, 2024 • edited Loading

System information (version)

Detailed description

Steps to reproduce

Issue submission checklist

sjuxax commented Mar 12, 2024

juls007 commented Mar 24, 2024 • edited by asmorkalov Loading

moodzunl commented Mar 24, 2024

HellmannM commented Mar 26, 2024 • edited Loading

stefanboca commented Mar 28, 2024 • edited Loading

miscco commented Apr 15, 2024

opperliu commented Apr 24, 2024

miscco commented Apr 24, 2024 • edited Loading

jiapei100 commented Apr 29, 2024

LiuToki commented May 7, 2024 • edited Loading

asmorkalov commented May 22, 2024

zachlowry commented May 29, 2024

Cupcc commented Jun 3, 2024

cudawarped commented Jun 3, 2024

jiapei100 commented Jun 10, 2024

heartlove commented Aug 7, 2024

cudawarped commented Aug 7, 2024

abysslover commented Oct 25, 2024 • edited Loading

CUDA Toolkit 12.4.0 `tuple` incompatibility #3690

CUDA Toolkit 12.4.0 `tuple` incompatibility #3690

runer112 commented Mar 8, 2024 •

edited

Loading

juls007 commented Mar 24, 2024 •

edited by asmorkalov

Loading

HellmannM commented Mar 26, 2024 •

edited

Loading

stefanboca commented Mar 28, 2024 •

edited

Loading

miscco commented Apr 24, 2024 •

edited

Loading

LiuToki commented May 7, 2024 •

edited

Loading

abysslover commented Oct 25, 2024 •

edited

Loading