Skip to content

Build fails with CUDA 12.1.1 on sample_farthest_points.cu #1529

@piernov

Description

@piernov

This is a follow-up to @Choons comments in #1509 .

pytorch3d 0.7.3
cuda 12.1.1
gcc 12.2.1

Build for pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu fails with errors:

FAILED: /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/build/temp.linux-x86_64-cpython-311/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.o 
/opt/cuda/bin/nvcc  -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -I/home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc -I/usr/lib/python3.11/site-packages/torch/include -I/usr/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/usr/lib/python3.11/site-packages/torch/include/TH -I/usr/lib/python3.11/site-packages/torch/include/THC -I/opt/cuda/include -I/usr/include/python3.11 -c -c /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu -o /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/build/temp.linux-x86_64-cpython-311/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
/opt/cuda/include/cub/block/specializations/block_reduce_raking.cuh(77): error: class "cub::WarpReduce<cub::KeyValuePair<int64_t, float>, 1, 0>" has no member "InternalWarpReduce"
      typedef typename WarpReduce<T, BlockRakingLayout::RAKING_THREADS>::InternalWarpReduce WarpReduce;
                                                                         ^
          detected during:
            instantiation of class "cub::BlockReduceRaking<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH> [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 100 of /opt/cuda/include/cub/block/specializations/block_reduce_raking_commutative_only.cuh
            instantiation of union "cub::BlockReduceRakingCommutativeOnly<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::_TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 342 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::AlignBytes<T>::Pad [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 349 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
            [ 3 instantiation contexts not shown ]
            instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::Uninitialized<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 279 of /opt/cuda/include/cub/block/block_reduce.cuh
            instantiation of class "cub::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, ALGORITHM=cub::BLOCK_REDUCE_RAKING_COMMUTATIVE_ONLY, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 34 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu
            instantiation of "void FarthestPointSamplingKernel<block_size>(at::PackedTensorAccessor64<float, 3UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<float, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>) [with block_size=1U]" at line 219 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu

/opt/cuda/include/cub/block/specializations/block_reduce_raking.cuh(103): error: argument list for class template "cub::WarpReduce" is missing
          typename WarpReduce::TempStorage warp_storage;
                   ^
          detected during:
            instantiation of union "cub::BlockReduceRaking<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::_TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 342 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::AlignBytes<T>::Pad [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 349 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::Uninitialized<T> [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 109
            [ 8 instantiation contexts not shown ]
            instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::Uninitialized<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 279 of /opt/cuda/include/cub/block/block_reduce.cuh
            instantiation of class "cub::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, ALGORITHM=cub::BLOCK_REDUCE_RAKING_COMMUTATIVE_ONLY, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 34 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu
            instantiation of "void FarthestPointSamplingKernel<block_size>(at::PackedTensorAccessor64<float, 3UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<float, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>) [with block_size=1U]" at line 219 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu

Build log here if needed:
build_log.txt

Issue is indeed related to the case 1 of FarthestPointSamplingCuda():

Calling FarthestPointSamplingKernel<1>() is invalid because it attempts to instantiate a cub::BlockReduce with BLOCK_DIM_X=1 which is unsupported according to NVIDIA/cccl#905 .

Cub commit NVIDIA/cub@c73f551 does make this bug apparent since it is the new template specialization of WarpReduce with LOGICAL_WARP_THREADS=1 that causes the build to fail, because this template specalization does not contain InternalWarpReduce used by BlockReduceRaking, itself used by BlockReduce.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions