-
Notifications
You must be signed in to change notification settings - Fork 722
Description
Migrate to Python
There are two things that should be migrated to python
C++ autograd.Functions that need to be migrated to Python
Meta functions that need to be migrated to python
What's missing in the current torch stable ABI
Macros
- TORCH_SELECTIVE_FN
- TORCH_SELECTIVE_NAME
- TORCH_SELECTIVE_SCHEMA
- TORCH_FN
- AT_CUDA_CHECK
- AT_DISPATCH_FLOATING_TYPES
- AT_DISPATCH_CASE
- AT_DISPATCH_SWITCH
- C10_CUDA_KERNEL_LAUNCH_CHECK
Native functions
- matmul
- masked_fill
- logical_not
- tensor.coalesce
- tensor.indices
- at::sparse_coo_tensor
- select
- unflatten
- sum(dim)
- unsqueeze
- view
- permute
- expand
- index(Slice(...)) ← perhaps narrow suffices
- empty_strided
- empty(size, TensorOptions.device(...).dtype(...))
- empty_like
Tensor Methods
- is_sparse
- packed_accessor64
- packed_accessor32
- device
- sizes
- strides
- storage
- accessor<type, int>
- contiguous
- layout
- requires_grad
Others
- c10::Dispatcher::singleton().findSchemaOrThrow.typed<decltype({*})>().call()
- at::TensorAccessor
- at::parallel_for
- at::PackedTensorAccessor
- at::OptionalDeviceGuard
- at::RestrictPtrTraits
- c10::optional → perhaps std::optional suffices
- c10::Storage
- Storage.is_alias_of(Storage)
- at::get_generator_or_default
- c10::impl::ExcludeDispatchKeyGuard
- c10::Device
- c10::Layout
- c10::IntArrayRef
- at::Tag::needs_fixed_stride_order
- at::AccumulateType
- CUDAGuard.emplace --> need torch::stable::accelerator::DeviceGuard.emplace
- at::native::gpu_kernel_multiple_outputs
- at::TensorIteratorConfig
Autocast
- c10::DispatchKey::Autocast
- at::autocast::get_autocast_dtype
- at::autocast::cached_cast
CUDA RNG
- at::cuda::detail::getDefaultCUDAGenerator()
- at::CUDAGeneratorImpl
- mutex_
- philox_cuda_state
- at::PhiloxCudaState
- at::cuda::philox::unpack
Per-file trackers
Some APIs like at::Tensor and TORCH_LIBRARY macros are omitted from tracking
xformers/csrc/attention/attention.cpp
xformers/csrc/attention/matmul.cpp
- at::matmul [link]
- Tensor.masked_fill [link]
- Tensor.logical_not [link]
- c10::Dispatcher::singleton().findSchemaOrThrow.typed<decltype(matmul_with_mask)>().call() [link]
- TORCH_FN [link]
- TORCH_SELECTIVE_NAME [link]
xformers/csrc/attention/sddmm.cpp
- TORCH_SELECTIVE_SCHEMA [link]
xformers/csrc/attention/sparse_softmax.cpp
- TORCH_SELECTIVE_SCHEMA [link]
xformers/csrc/attention/spmm.cpp
- TORCH_SELECTIVE_SCHEMA [link]
xformers/csrc/attention/autograd/matmul.cpp [link]
This file contains a custom autograd.Function written in C++, the recommendation here would be to rewrite this in python
xformers/csrc/attention/cpu/matmul.cpp
- at::TensorAccessor [link]
- at::parallel_for [link]
- dim, size(i), is_cuda [link]
- is_sparse [link]
- TORCH_CHECK [link] → STD_TORCH_CHECK
- Tensor.coalesce [link]
- Tensor.indices [link]
- Tensor.tranpsose [link]
- empty(size, TensorOptions) [link] → can use new_empty
- AT_DISPATCH_FLOATING_TYPES [link]
- tensor.scalar_type() [link]
- sparse_coo_tensor [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/attention/cpu/sddmm.cpp
- TORCH_CHECK [link] → STD_TORCH_CHECK
- dim, size(i), is_cuda, is_contiguous [link]
- is_sparse [link]
- empty(size, TensorOptions) [link] → can use new_empty
- .data_ptr() [link] → can use data_ptr and cast
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/attention/cpu/sparse_softmax.cpp
- TORCH_CHECK [link] → STD_TORCH_CHECK
- dim, size(i), is_cuda, is_contiguous [link]
- is_sparse [link]
- empty(size, TensorOptions) [link] → can use new_empty
- .data_ptr() [link] → can use data_ptr and cast
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/attention/cpu/spmm.cpp
- TORCH_CHECK [link] → STD_TORCH_CHECK
- dim, size(i), is_cuda, is_contiguous [link]
- is_sparse [link]
- empty(size, TensorOptions) [link] → can use new_empty
- .data_ptr() [link] → can use data_ptr and cast
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/attention/cuda/matmul.cu
- PackedTensorAccessor [link]
- TORCH_CHECK [link] → STD_TORCH_CHECK
- dim, size(i), is_cuda [link]
- is_sparse [link]
- Tensor.device [link]
- CUDAGuard [link] --> use DeviceGuard
- Tensor.coalesce [link]
- Tensor.indices [link]
- Tensor.transpose [link]
- empty(size, TensorOptions) [link]
- at::cuda::getCurrentCUDAStream [link] --> use device generic Stream
- AT_DISPATCH_FLOATING_TYPES_AND_HALF [link]
- Tensor.scalar_type [link]
- Tensor.packed_accessor64 [link]
- AT_CUDA_CHECK [link]
- at::sparse_coo_tensor [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/attention/cuda/sddmm2_cuda.cu
- size(i) [link]
- numel() [link]
- empty(size, TensorOptions) [link] → use empty_like
- at::cuda::getCurrentCUDAStream [link] --> use at::accelerator device generic version
- .data_ptr → can use data_ptr and cast [link]
- AT_CUDA_CHECK [link]
- .device().type() [link]
- torch::kDevice [link]
- is_contiguous [link]
- dtype → change to scalar_type [link]
- torch::kType [link] → torch::headeronly::ScalarType::Type
- TORCH_SELECTIVE_NAME [link]
- TORCH_SELECTIVE_SCHEMA [link]
- TORCH_FN [link]
xformers/csrc/attention/cuda/sddmm.cu
- TORCH_CHECK [link] → STD_TORCH_CHECK
- dim, size(i), is_cuda, is_contiguous [link]
- is_sparse [link]
- Tensor.device() [link]
- at::cuda::getCurrentCUDAStream() [link] --> use at::stable::accelerator device generic version
- empty(size, TensorOptions) [link] → use empty_like
- AT_CUDA_CHECK [link]
- .data_ptr [link] → can use data_ptr and cast
- TORCH_SELECTIVE_NAME [link]
- TORCH_SELECTIVE_FN [link]
xformers/csrc/attention/cuda/sparse_softmax.cu
- TORCH_CHECK [link] → STD_TORCH_CHECK
- dim, size(i), is_cuda, is_contiguous [link]
- is_sparse [link]
- Tensor.device() [link]
- at::cuda::getCurrentCUDAStream() [link] --> use at::stable::accelerator device generic API
- empty(size, TensorOptions) [link] → use empty_like
- AT_CUDA_CHECK [link]
- .data_ptr [link] → can use data_ptr and cast
- TORCH_SELECTIVE_NAME [link]
- TORCH_SELECTIVE_FN [link]
xformers/csrc/attention/cuda/spmm.cu
- TORCH_CHECK [link] → STD_TORCH_CHECK
- dim, size(i), is_cuda, is_contiguous [link]
- is_sparse [link]
- Tensor.device() [link]
- at::cuda::getCurrentCUDAStream() [link] --> use at::stable::accelerator device generic API
- empty(size, TensorOptions) [link] → use empty_like
- AT_CUDA_CHECK [link]
- .data_ptr [link] → can use data_ptr and cast
- TORCH_SELECTIVE_NAME [link]
- TORCH_SELECTIVE_FN [link]
xformers/csrc/attention/hip_decoder/attention_forward_splitk.cpp
- c10::Half, c10::BFloat16 [link] → torch::headeronly::ScalarType::{*}
- AT_DISPATCH_CASE, AT_DISPATCH_SWITCH [link]
- at::OptionalDeviceGuard [link]
- tensor.device() [link]
- is_cuda, size(i) [link]
- at::hip::getCurrentHIPStream().stream() [link] --> use at::stable::accelerator device generic API
- at::ScalarType::Half, BFloat16, Float [link] → torch::headeronly::ScalarType::{*}
- Tensor.scalar_type [link]
- Tensor.packed_accessor32, Tensor.packed_accessor64 [link]
- packed_accessor{}().data() [link]
- at::RestrictPtrTraits [link]
- empty_like [link]
- empty(size, TensorOptions.dtype()) [link] → can use empty_like
xformers/csrc/attention/hip_fmha/attention_backward_generic_ck_tiled.cpp
- c10::optional [link]
- c10::optional.has_value [link]
- dim, size(i) [link]
- sizes [link]
- Scalar_type [link]
- at::ScalarType::Int [link]
- c10::hip::getCurrentHIPStream().stream() [link] --> use at::stable::accelerator device generic API
- Tensor.storage() [link]
- Storage.is_alias_of(Storage) [link]
- at::empty(..., Tensor.options()) [link]
- select(int, int) [link]
- empty_strided(q.sizes(), q.strides(), q.options()) [link]
- empty(sizes, TensorOptions.dtype(...)) [link]
- fill_ [link]
- Tensor.strides [link]
- Tensor.requires_grad [link]
- Tensor.unflatten(int, List[int]) [link]
- Tensor.sum(dim) [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/attention/hip_fmha/attention_ck_rand_uniform.cpp
- c10::hip::getCurrentHIPStream().stream() [link] --> use at::stable::accelerator device generic API
- at::get_generator_or_default [link]
- at::CUDAGeneratorImpl [link]
- at::cuda::detail::getDefaultCUDAGenerator() [link]
- at::PhiloxCudaState [link]
- CUDAGeneratorImpl->mutex_ [link]
- CUDAGeneratorImpl->philox_cuda_state [link]
- at::cuda::philox::unpack [link]
- at::empty(..., TensorOptions.dtype(..)) [link]
- stride(i) [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/attention/hip_fmha/attention_forward_generic_ck_tiled.cpp
- c10::optional [link]
- Tensor.size(i), Tensor.dim(), Tensor.scalar_type() [link]
- at::ScalarType::Int [link]
- c10::optional.has_value() [link]
- c10::hip::getCurrentHIPStream().stream() [link] --> use at::stable::accelerator device generic API
- at::empty(..., TensorOptions) [link]
- at::PhiloxCudaState [link]
- at::CUDAGeneratorImpl [link]
- at::cuda::detail::getDefaultCUDAGenerator() [link]
- CUDAGeneratorImpl->philox_cuda_state [link]
- CUDAGeneratorImpl->mutex_ [link]
- at::cuda::philox::unpack [link]
- Tensor.stride(i) [link]
- at::empty(..., TensorOptions.dtype(..)) [link]
- at::kFloat [link]
- Efficient_attention_forward_ck_meta → rewrite meta function in python [link]
- empty_symint
- sym_size
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/attention/hip_fmha/ck_fmha_util.h
- Tensor.is_sparse, is_contiguous, is_cpu, is_cuda, stride(i), size(i) [link]
- Tensor.unsqueeze(int) [link]
- Tensor.expand(List[int]) [link]
- Tensor.view(List[int]) [link]
xformers/csrc/sparse24/meta_utils.cu
- is_contiguous, dim, size(i), .scalar_type [link]
- at::empty(..., TensorOptions.dtype(...)) [link]
- at::ScalarType::Short [link]
- Tensor.accessor<type, int> [link]
- Tensor.contiguous [link]
- at::empty(..., TensorOptions) [link]
- view(List[int]) [link]
- permute(List[int]) [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/sparse24/sparse24_apply_dense_output.cu
- size(i), data_ptr, dim, stride(i), scalar_type [link]
- at::ScalarType::... [link]
- at::cuda::CudaGuard [link]
- at::cuda::getCurrentCUDAStream [link]
- C10_CUDA_KERNEL_LAUNCH_CHECK [link]
- c10::impl::ExcludeDispatchKeyGuard [link]
- c10::DispatchKey::Autocast [link]
- at::autocast::get_autocast_dtype [link]
- at::kCUDA [link]
- at::autocast::cached_cast [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/sparse24/sparse24_metadata.h
- at::ScalarType::{Type} [link]
- Tensor.scalar_type() [link]
- at::empty(..., TensorOptions.device(...).dtype(...)) [link]
- t.index({Slice(None, data_scalars)}) [link]
- t.view(...) [link]
- t.permute(...) [link]
xformers/csrc/sparseNM_dense.cu
- at::cuda::DeviceGuard [link]
- DeviceGuard.emplace(device) [link]
- Tensor.device() [link]
- at::cuda::getCurrentCUDAStream() [link]
- .size(i), .stride(i), .data_ptr(), .scalar_type [link]
- at::ScalarType::{Type} [link]
- C10_CUDA_KERNEL_LAUNCH_CHECK [link]
- TORCH_SELECTIVE_FN [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_SELECTIVE_SCHEMA [link]
xformers/csrc/sparse24/gemm.cu
- at::IntArrayRef::value_type [link]
- Tensor.size(i) [link]
- AT_ERROR [link]
- at::k{Type} [link]
- t.new_empty(..., TensorOptions.dtype(...)) [link]
- at::cuda::getCurrentCUDAStream [link]
- C10_CUDA_KERNEL_LAUNCH_CHECK [link]
- at::cuda::getCurrentDeviceProperties [link]
- Tensor.layout() [link]
- Layout::Strided [link]
- Tensor.strides [link]
- Tensor.scalar_type() [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_SELECTIVE_FN [link]
xformers/csrc/sparse24/sparse24_apply.cu.
- t.stride(i) [link]
- t.contiguous [link]
- at::cuda::CUDAGuard [link]
- CUDAGuard.emplace(device) [link]
- t.dim t.size(i) .stride(i) .scalar_type [link]
- at::cuda::getCurrentCUDAStream [link]
- C10_CUDA_KERNEL_LAUNCH_CHECK [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/sparse24/sparse24.cpp
xformers/csrc/sparse24/sparse24_largest_mask_2d.cu
- is_cuda, is_sparse, is_contiguous, dim [link]
- at::cuda::CUDAGuard [link]
- at::cuda::getCurrentCUDAStream [link]
- at::empty_like [link]
- .size(i) [link]
- i.scalar_type() [link]
- at::ScalarType::Type [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/sparse24/sparse24_pack.cu
- at::cuda::CUDAGuard [link]
- CUDAGuard.emplace(device) [link]
- t.device() [link]
- stride(i), size(i) [link]
- at::empty(..., TensorOptions.dtype(...)) [link] ← might not be possible to use empty_like
- at::cuda::getCurrentCUDAStream() [link]
- C10_CUDA_KERNEL_LAUNCH_CHECK [link]
- c10::impl::ExcludeDispatchKeyGuard [link]
- at::autocast::get_autocast_dtype [link]
- at::autocast::cached_cast [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/sparse24/sparse24_pack_test.cu
- at::PackedTensorAccessor [link]
- scalar_Type, dim, size(i) [link]
- at::cuda::CUDAGuard [link]
- at::cuda::getCurrentCUDAStream [link]
- at::zeros(..., TensorOptions) [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/swiglu/swiglu_op.cpp
- dual_gemm_silu_identity_mul_META → rewrite in python [link]
- sym_size [link]
- sym_stride [link]
- empty_symint [link]
- TORCH_SELECTIVE_SCHEMA [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
- at::Tag::needs_fixed_stride_order [link]
xformers/csrc/swiglu/swiglu_packedw.cpp
- c10::Dispatcher::singleton().findSchemaOrThrow.typed<>().call [link]
- c10::IntArrayRef [link]
- SwiGLUPackedWeights C++ autograd.Function → rewrite in python [link]
xformers/csrc/swiglu/cuda/dual_gemm_silu_identity_mul.cu
- dim(), stride(i), size(i) [link]
- at::cuda::CUDAGuard [link]
- t.device() [link]
- at::empty(..., TensorOptions) [link]
- at::cuda::getCurrentCUDAStream [link]
- at::cuda::getDeviceProperties [link]
- Device.index() [link]
- at::empty(..., TensorOptions.dtype(...)) [link]
- c10::impl::ExcludeDispatchKeyGuard [link]
- at::autocast::get_autocast_dtype [link]
- at::autocast::cached_cast [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/swiglu/cuda/gemm_fused_operand_sum.cu
- at::cuda::CUDAGuard [link]
- t.device() [link]
- at::cuda::getCurrentCUDAStream [link]
- at::cuda::getDeviceProperties [link]
- Device.index() [link]
- .stride(i), .dim, .scalar_type [link]
- c10::impl::ExcludeDispatchKeyGuard [link]
- at::autocast::get_autocast_dtype [link]
- at::autocast::cached_cast [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]
xformers/csrc/swiglu/cuda/silu_bw_fused.cu
- silu_bw_fused → rewrite meta fn in python [link]
- Tensor.select [link]
- AT_DISPATCH_FLOATING_TYPES_AND2 [link]
- at::ScalarType::Type [link]
- at::TensorIteratorConfig [link]
- at::AccumulateType [link]
- at::native::gpu_kernel_multiple_outputs [link]
- c10::impl::ExcludeDispatchKeyGuard [link]
- at::autocast::get_autocast_dtype [link]
- at::autocast::cached_cast [link]
- TORCH_SELECTIVE_NAME [link]
- TORCH_FN [link]