-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
This is a follow-up to @Choons comments in #1509 .
pytorch3d 0.7.3
cuda 12.1.1
gcc 12.2.1
Build for pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu fails with errors:
FAILED: /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/build/temp.linux-x86_64-cpython-311/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.o
/opt/cuda/bin/nvcc -DWITH_CUDA -DTHRUST_IGNORE_CUB_VERSION_CHECK -I/home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc -I/usr/lib/python3.11/site-packages/torch/include -I/usr/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/usr/lib/python3.11/site-packages/torch/include/TH -I/usr/lib/python3.11/site-packages/torch/include/THC -I/opt/cuda/include -I/usr/include/python3.11 -c -c /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu -o /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/build/temp.linux-x86_64-cpython-311/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -std=c++14 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1017"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
/opt/cuda/include/cub/block/specializations/block_reduce_raking.cuh(77): error: class "cub::WarpReduce<cub::KeyValuePair<int64_t, float>, 1, 0>" has no member "InternalWarpReduce"
typedef typename WarpReduce<T, BlockRakingLayout::RAKING_THREADS>::InternalWarpReduce WarpReduce;
^
detected during:
instantiation of class "cub::BlockReduceRaking<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH> [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 100 of /opt/cuda/include/cub/block/specializations/block_reduce_raking_commutative_only.cuh
instantiation of union "cub::BlockReduceRakingCommutativeOnly<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::_TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 342 of /opt/cuda/include/cub/util_type.cuh
instantiation of class "cub::AlignBytes<T>::Pad [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 349 of /opt/cuda/include/cub/util_type.cuh
instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
[ 3 instantiation contexts not shown ]
instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
instantiation of class "cub::Uninitialized<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 279 of /opt/cuda/include/cub/block/block_reduce.cuh
instantiation of class "cub::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, ALGORITHM=cub::BLOCK_REDUCE_RAKING_COMMUTATIVE_ONLY, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 34 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu
instantiation of "void FarthestPointSamplingKernel<block_size>(at::PackedTensorAccessor64<float, 3UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<float, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>) [with block_size=1U]" at line 219 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu
/opt/cuda/include/cub/block/specializations/block_reduce_raking.cuh(103): error: argument list for class template "cub::WarpReduce" is missing
typename WarpReduce::TempStorage warp_storage;
^
detected during:
instantiation of union "cub::BlockReduceRaking<T, BLOCK_DIM_X, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::_TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 342 of /opt/cuda/include/cub/util_type.cuh
instantiation of class "cub::AlignBytes<T>::Pad [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 349 of /opt/cuda/include/cub/util_type.cuh
instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
instantiation of class "cub::Uninitialized<T> [with T=cub::BlockReduceRaking<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::_TempStorage]" at line 109
[ 8 instantiation contexts not shown ]
instantiation of class "cub::AlignBytes<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 402 of /opt/cuda/include/cub/util_type.cuh
instantiation of class "cub::UnitWord<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 671 of /opt/cuda/include/cub/util_type.cuh
instantiation of class "cub::Uninitialized<T> [with T=cub::BlockReduceRakingCommutativeOnly<cub::KeyValuePair<int64_t, float>, 1, 1, 1, 0>::TempStorage]" at line 279 of /opt/cuda/include/cub/block/block_reduce.cuh
instantiation of class "cub::BlockReduce<T, BLOCK_DIM_X, ALGORITHM, BLOCK_DIM_Y, BLOCK_DIM_Z, LEGACY_PTX_ARCH>::TempStorage [with T=cub::KeyValuePair<int64_t, float>, BLOCK_DIM_X=1, ALGORITHM=cub::BLOCK_REDUCE_RAKING_COMMUTATIVE_ONLY, BLOCK_DIM_Y=1, BLOCK_DIM_Z=1, LEGACY_PTX_ARCH=0]" at line 34 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu
instantiation of "void FarthestPointSamplingKernel<block_size>(at::PackedTensorAccessor64<float, 3UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<float, 2UL, at::RestrictPtrTraits>, at::PackedTensorAccessor64<int64_t, 1UL, at::RestrictPtrTraits>) [with block_size=1U]" at line 219 of /home/piernov/.cache/yay/python-pytorch3d/src/pytorch3d-0.7.3/pytorch3d/csrc/sample_farthest_points/sample_farthest_points.cu
Build log here if needed:
build_log.txt
Issue is indeed related to the case 1 of FarthestPointSamplingCuda():
| case 1: |
Calling FarthestPointSamplingKernel<1>() is invalid because it attempts to instantiate a cub::BlockReduce with BLOCK_DIM_X=1 which is unsupported according to NVIDIA/cccl#905 .
Cub commit NVIDIA/cub@c73f551 does make this bug apparent since it is the new template specialization of WarpReduce with LOGICAL_WARP_THREADS=1 that causes the build to fail, because this template specalization does not contain InternalWarpReduce used by BlockReduceRaking, itself used by BlockReduce.