Build fails with CUDA 12.1.1 on sample_farthest_points.cu

This is a follow-up to @Choons comments in #1509 .

pytorch3d 0.7.3
cuda 12.1.1
gcc 12.2.1

Build for `pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu` fails with errors:
```
FAILED: /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/build/temp.linux-x86_64-cpython-311/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.o 
/opt/cuda/bin/nvcc  -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -I/home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc -I/usr/lib/python3.11/site-packages/torch/include -I/usr/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/usr/lib/python3.11/site-packages/torch/include/TH -I/usr/lib/python3.11/site-packages/torch/include/THC -I/opt/cuda/include -I/usr/include/python3.11 -c -c /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu -o /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/build/temp.linux-x86_64-cpython-311/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
/opt/cuda/include/cub/block/specializations/block_reduce_raking.cuh(77): error: class "cub::WarpReduce<cub::KeyValuePair<int64_t, float>, 1, 0>" has no member "InternalWarpReduce"
      typedef typename WarpReduce<T, BlockRakingLayout::RAKING_THREADS>::InternalWarpReduce WarpReduce;
                                                                         ^
          detected during:
            instantiation of class "cub::BlockReduceRaking<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH> [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 100 of /opt/cuda/include/cub/block/specializations/block_reduce_raking_commutative_only.cuh
            instantiation of union "cub::BlockReduceRakingCommutativeOnly<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::_TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 342 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::AlignBytes<T>::Pad [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 349 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
            [ 3 instantiation contexts not shown ]
            instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::Uninitialized<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 279 of /opt/cuda/include/cub/block/block_reduce.cuh
            instantiation of class "cub::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, ALGORITHM=cub::BLOCK_REDUCE_RAKING_COMMUTATIVE_ONLY, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 34 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu
            instantiation of "void FarthestPointSamplingKernel<block_size>(at::PackedTensorAccessor64<float, 3UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<float, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>) [with block_size=1U]" at line 219 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu

/opt/cuda/include/cub/block/specializations/block_reduce_raking.cuh(103): error: argument list for class template "cub::WarpReduce" is missing
          typename WarpReduce::TempStorage warp_storage;
                   ^
          detected during:
            instantiation of union "cub::BlockReduceRaking<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::_TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 342 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::AlignBytes<T>::Pad [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 349 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::Uninitialized<T> [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 109
            [ 8 instantiation contexts not shown ]
            instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
            instantiation of class "cub::Uninitialized<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 279 of /opt/cuda/include/cub/block/block_reduce.cuh
            instantiation of class "cub::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, ALGORITHM=cub::BLOCK_REDUCE_RAKING_COMMUTATIVE_ONLY, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 34 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu
            instantiation of "void FarthestPointSamplingKernel<block_size>(at::PackedTensorAccessor64<float, 3UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<float, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>) [with block_size=1U]" at line 219 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu
```

Build log here if needed:
[build_log.txt](https://github.com/facebookresearch/pytorch3d/files/11398031/build_log.txt)

Issue is indeed related to the `case 1` of `FarthestPointSamplingCuda()`: https://github.com/facebookresearch/pytorch3d/blob/3e3644e5344ed89550188a511f7bb3937fe75d36/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu#L218

Calling `FarthestPointSamplingKernel<1>()` is invalid because it attempts to instantiate a `cub::BlockReduce` with `BLOCK_DIM_X=1` which is unsupported according to NVIDIA/cccl#905 .

Cub commit NVIDIA/cub@c73f551598ea5ada4a2d9a276738f37e71819bb4 does make this bug apparent since it is the new template specialization of `WarpReduce` with `LOGICAL_WARP_THREADS=1` that causes the build to fail, because this template specalization does not contain `InternalWarpReduce` used by `BlockReduceRaking`, itself used by `BlockReduce`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build fails with CUDA 12.1.1 on sample_farthest_points.cu #1529

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Build fails with CUDA 12.1.1 on sample_farthest_points.cu #1529

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions