Skip to content

Tracker for migrating to torch stable ABI #1333

@mikaylagawarecki

Description

@mikaylagawarecki

Migrate to Python

There are two things that should be migrated to python

C++ autograd.Functions that need to be migrated to Python

  • MatmulWithMask [link]
  • SwigluPackedWeights [link]

Meta functions that need to be migrated to python

  • efficient_attention_forward_ck_meta [link]
  • dual_gemm_silu_identity_mul_META [link]

What's missing in the current torch stable ABI

Macros

  • TORCH_SELECTIVE_FN
  • TORCH_SELECTIVE_NAME
  • TORCH_SELECTIVE_SCHEMA
  • TORCH_FN
  • AT_CUDA_CHECK
  • AT_DISPATCH_FLOATING_TYPES
  • AT_DISPATCH_CASE
  • AT_DISPATCH_SWITCH
  • C10_CUDA_KERNEL_LAUNCH_CHECK

Native functions

  • matmul
  • masked_fill
  • logical_not
  • tensor.coalesce
  • tensor.indices
  • at::sparse_coo_tensor
  • select
  • unflatten
  • sum(dim)
  • unsqueeze
  • view
  • permute
  • expand
  • index(Slice(...)) ← perhaps narrow suffices
  • empty_strided
  • empty(size, TensorOptions.device(...).dtype(...))
  • empty_like

Tensor Methods

  • is_sparse
  • packed_accessor64
  • packed_accessor32
  • device
  • sizes
  • strides
  • storage
  • accessor<type, int>
  • contiguous
  • layout
  • requires_grad

Others

  • c10::Dispatcher::singleton().findSchemaOrThrow.typed<decltype({*})>().call()
  • at::TensorAccessor
  • at::parallel_for
  • at::PackedTensorAccessor
  • at::OptionalDeviceGuard
  • at::RestrictPtrTraits
  • c10::optional → perhaps std::optional suffices
  • c10::Storage
  • Storage.is_alias_of(Storage)
  • at::get_generator_or_default
  • c10::impl::ExcludeDispatchKeyGuard
  • c10::Device
  • c10::Layout
  • c10::IntArrayRef
  • at::Tag::needs_fixed_stride_order
  • at::AccumulateType
  • CUDAGuard.emplace --> need torch::stable::accelerator::DeviceGuard.emplace
  • at::native::gpu_kernel_multiple_outputs
  • at::TensorIteratorConfig
Autocast
  • c10::DispatchKey::Autocast
  • at::autocast::get_autocast_dtype
  • at::autocast::cached_cast
CUDA RNG
  • at::cuda::detail::getDefaultCUDAGenerator()
  • at::CUDAGeneratorImpl
    • mutex_
    • philox_cuda_state
  • at::PhiloxCudaState
  • at::cuda::philox::unpack

Per-file trackers

Some APIs like at::Tensor and TORCH_LIBRARY macros are omitted from tracking

xformers/csrc/attention/attention.cpp

  • TORCH_LIBRARY_FRAGMENT [link] → STABLE_TORCH_LIBRARY_FRAGMENT
  • TORCH_SELECTIVE_SCHEMA [link]

xformers/csrc/attention/matmul.cpp

  • at::matmul [link]
  • Tensor.masked_fill [link]
  • Tensor.logical_not [link]
  • c10::Dispatcher::singleton().findSchemaOrThrow.typed<decltype(matmul_with_mask)>().call() [link]
  • TORCH_FN [link]
  • TORCH_SELECTIVE_NAME [link]

xformers/csrc/attention/sddmm.cpp

  • TORCH_SELECTIVE_SCHEMA [link]

xformers/csrc/attention/sparse_softmax.cpp

  • TORCH_SELECTIVE_SCHEMA [link]

xformers/csrc/attention/spmm.cpp

  • TORCH_SELECTIVE_SCHEMA [link]

xformers/csrc/attention/autograd/matmul.cpp [link]
This file contains a custom autograd.Function written in C++, the recommendation here would be to rewrite this in python

xformers/csrc/attention/cpu/matmul.cpp

  • at::TensorAccessor [link]
  • at::parallel_for [link]
  • dim, size(i), is_cuda [link]
  • is_sparse [link]
  • TORCH_CHECK [link] → STD_TORCH_CHECK
  • Tensor.coalesce [link]
  • Tensor.indices [link]
  • Tensor.tranpsose [link]
  • empty(size, TensorOptions) [link] → can use new_empty
  • AT_DISPATCH_FLOATING_TYPES [link]
  • tensor.scalar_type() [link]
  • sparse_coo_tensor [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/attention/cpu/sddmm.cpp

  • TORCH_CHECK [link] → STD_TORCH_CHECK
  • dim, size(i), is_cuda, is_contiguous [link]
  • is_sparse [link]
  • empty(size, TensorOptions) [link] → can use new_empty
  • .data_ptr() [link] → can use data_ptr and cast
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/attention/cpu/sparse_softmax.cpp

  • TORCH_CHECK [link] → STD_TORCH_CHECK
  • dim, size(i), is_cuda, is_contiguous [link]
  • is_sparse [link]
  • empty(size, TensorOptions) [link] → can use new_empty
  • .data_ptr() [link] → can use data_ptr and cast
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/attention/cpu/spmm.cpp

  • TORCH_CHECK [link] → STD_TORCH_CHECK
  • dim, size(i), is_cuda, is_contiguous [link]
  • is_sparse [link]
  • empty(size, TensorOptions) [link] → can use new_empty
  • .data_ptr() [link] → can use data_ptr and cast
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/attention/cuda/matmul.cu

  • PackedTensorAccessor [link]
  • TORCH_CHECK [link] → STD_TORCH_CHECK
  • dim, size(i), is_cuda [link]
  • is_sparse [link]
  • Tensor.device [link]
  • CUDAGuard [link] --> use DeviceGuard
  • Tensor.coalesce [link]
  • Tensor.indices [link]
  • Tensor.transpose [link]
  • empty(size, TensorOptions) [link]
  • at::cuda::getCurrentCUDAStream [link] --> use device generic Stream
  • AT_DISPATCH_FLOATING_TYPES_AND_HALF [link]
  • Tensor.scalar_type [link]
  • Tensor.packed_accessor64 [link]
  • AT_CUDA_CHECK [link]
  • at::sparse_coo_tensor [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/attention/cuda/sddmm2_cuda.cu

  • size(i) [link]
  • numel() [link]
  • empty(size, TensorOptions) [link] → use empty_like
  • at::cuda::getCurrentCUDAStream [link] --> use at::accelerator device generic version
  • .data_ptr → can use data_ptr and cast [link]
  • AT_CUDA_CHECK [link]
  • .device().type() [link]
  • torch::kDevice [link]
  • is_contiguous [link]
  • dtype → change to scalar_type [link]
  • torch::kType [link] → torch::headeronly::ScalarType::Type
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_SELECTIVE_SCHEMA [link]
  • TORCH_FN [link]

xformers/csrc/attention/cuda/sddmm.cu

  • TORCH_CHECK [link] → STD_TORCH_CHECK
  • dim, size(i), is_cuda, is_contiguous [link]
  • is_sparse [link]
  • Tensor.device() [link]
  • at::cuda::getCurrentCUDAStream() [link] --> use at::stable::accelerator device generic version
  • empty(size, TensorOptions) [link] → use empty_like
  • AT_CUDA_CHECK [link]
  • .data_ptr [link] → can use data_ptr and cast
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_SELECTIVE_FN [link]

xformers/csrc/attention/cuda/sparse_softmax.cu

  • TORCH_CHECK [link] → STD_TORCH_CHECK
  • dim, size(i), is_cuda, is_contiguous [link]
  • is_sparse [link]
  • Tensor.device() [link]
  • at::cuda::getCurrentCUDAStream() [link] --> use at::stable::accelerator device generic API
  • empty(size, TensorOptions) [link] → use empty_like
  • AT_CUDA_CHECK [link]
  • .data_ptr [link] → can use data_ptr and cast
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_SELECTIVE_FN [link]

xformers/csrc/attention/cuda/spmm.cu

  • TORCH_CHECK [link] → STD_TORCH_CHECK
  • dim, size(i), is_cuda, is_contiguous [link]
  • is_sparse [link]
  • Tensor.device() [link]
  • at::cuda::getCurrentCUDAStream() [link] --> use at::stable::accelerator device generic API
  • empty(size, TensorOptions) [link] → use empty_like
  • AT_CUDA_CHECK [link]
  • .data_ptr [link] → can use data_ptr and cast
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_SELECTIVE_FN [link]

xformers/csrc/attention/hip_decoder/attention_forward_splitk.cpp

  • c10::Half, c10::BFloat16 [link] → torch::headeronly::ScalarType::{*}
  • AT_DISPATCH_CASE, AT_DISPATCH_SWITCH [link]
  • at::OptionalDeviceGuard [link]
  • tensor.device() [link]
  • is_cuda, size(i) [link]
  • at::hip::getCurrentHIPStream().stream() [link] --> use at::stable::accelerator device generic API
  • at::ScalarType::Half, BFloat16, Float [link] → torch::headeronly::ScalarType::{*}
  • Tensor.scalar_type [link]
  • Tensor.packed_accessor32, Tensor.packed_accessor64 [link]
  • packed_accessor{}().data() [link]
  • at::RestrictPtrTraits [link]
  • empty_like [link]
  • empty(size, TensorOptions.dtype()) [link] → can use empty_like

xformers/csrc/attention/hip_fmha/attention_backward_generic_ck_tiled.cpp

  • c10::optional [link]
  • c10::optional.has_value [link]
  • dim, size(i) [link]
  • sizes [link]
  • Scalar_type [link]
  • at::ScalarType::Int [link]
  • c10::hip::getCurrentHIPStream().stream() [link] --> use at::stable::accelerator device generic API
  • Tensor.storage() [link]
  • Storage.is_alias_of(Storage) [link]
  • at::empty(..., Tensor.options()) [link]
  • select(int, int) [link]
  • empty_strided(q.sizes(), q.strides(), q.options()) [link]
  • empty(sizes, TensorOptions.dtype(...)) [link]
  • fill_ [link]
  • Tensor.strides [link]
  • Tensor.requires_grad [link]
  • Tensor.unflatten(int, List[int]) [link]
  • Tensor.sum(dim) [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/attention/hip_fmha/attention_ck_rand_uniform.cpp

  • c10::hip::getCurrentHIPStream().stream() [link] --> use at::stable::accelerator device generic API
  • at::get_generator_or_default [link]
  • at::CUDAGeneratorImpl [link]
  • at::cuda::detail::getDefaultCUDAGenerator() [link]
  • at::PhiloxCudaState [link]
  • CUDAGeneratorImpl->mutex_ [link]
  • CUDAGeneratorImpl->philox_cuda_state [link]
  • at::cuda::philox::unpack [link]
  • at::empty(..., TensorOptions.dtype(..)) [link]
  • stride(i) [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/attention/hip_fmha/attention_forward_generic_ck_tiled.cpp

  • c10::optional [link]
  • Tensor.size(i), Tensor.dim(), Tensor.scalar_type() [link]
  • at::ScalarType::Int [link]
  • c10::optional.has_value() [link]
  • c10::hip::getCurrentHIPStream().stream() [link] --> use at::stable::accelerator device generic API
  • at::empty(..., TensorOptions) [link]
  • at::PhiloxCudaState [link]
  • at::CUDAGeneratorImpl [link]
  • at::cuda::detail::getDefaultCUDAGenerator() [link]
  • CUDAGeneratorImpl->philox_cuda_state [link]
  • CUDAGeneratorImpl->mutex_ [link]
  • at::cuda::philox::unpack [link]
  • Tensor.stride(i) [link]
  • at::empty(..., TensorOptions.dtype(..)) [link]
  • at::kFloat [link]
  • Efficient_attention_forward_ck_meta → rewrite meta function in python [link]
    • empty_symint
    • sym_size
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/attention/hip_fmha/ck_fmha_util.h

  • Tensor.is_sparse, is_contiguous, is_cpu, is_cuda, stride(i), size(i) [link]
  • Tensor.unsqueeze(int) [link]
  • Tensor.expand(List[int]) [link]
  • Tensor.view(List[int]) [link]

xformers/csrc/sparse24/meta_utils.cu

  • is_contiguous, dim, size(i), .scalar_type [link]
  • at::empty(..., TensorOptions.dtype(...)) [link]
  • at::ScalarType::Short [link]
  • Tensor.accessor<type, int> [link]
  • Tensor.contiguous [link]
  • at::empty(..., TensorOptions) [link]
  • view(List[int]) [link]
  • permute(List[int]) [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/sparse24/sparse24_apply_dense_output.cu

  • size(i), data_ptr, dim, stride(i), scalar_type [link]
  • at::ScalarType::... [link]
  • at::cuda::CudaGuard [link]
  • at::cuda::getCurrentCUDAStream [link]
  • C10_CUDA_KERNEL_LAUNCH_CHECK [link]
  • c10::impl::ExcludeDispatchKeyGuard [link]
  • c10::DispatchKey::Autocast [link]
  • at::autocast::get_autocast_dtype [link]
  • at::kCUDA [link]
  • at::autocast::cached_cast [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/sparse24/sparse24_metadata.h

  • at::ScalarType::{Type} [link]
  • Tensor.scalar_type() [link]
  • at::empty(..., TensorOptions.device(...).dtype(...)) [link]
  • t.index({Slice(None, data_scalars)}) [link]
  • t.view(...) [link]
  • t.permute(...) [link]

xformers/csrc/sparseNM_dense.cu

  • at::cuda::DeviceGuard [link]
  • DeviceGuard.emplace(device) [link]
  • Tensor.device() [link]
  • at::cuda::getCurrentCUDAStream() [link]
  • .size(i), .stride(i), .data_ptr(), .scalar_type [link]
  • at::ScalarType::{Type} [link]
  • C10_CUDA_KERNEL_LAUNCH_CHECK [link]
  • TORCH_SELECTIVE_FN [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_SELECTIVE_SCHEMA [link]

xformers/csrc/sparse24/gemm.cu

  • at::IntArrayRef::value_type [link]
  • Tensor.size(i) [link]
  • AT_ERROR [link]
  • at::k{Type} [link]
  • t.new_empty(..., TensorOptions.dtype(...)) [link]
  • at::cuda::getCurrentCUDAStream [link]
  • C10_CUDA_KERNEL_LAUNCH_CHECK [link]
  • at::cuda::getCurrentDeviceProperties [link]
  • Tensor.layout() [link]
  • Layout::Strided [link]
  • Tensor.strides [link]
  • Tensor.scalar_type() [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_SELECTIVE_FN [link]

xformers/csrc/sparse24/sparse24_apply.cu.

  • t.stride(i) [link]
  • t.contiguous [link]
  • at::cuda::CUDAGuard [link]
  • CUDAGuard.emplace(device) [link]
  • t.dim t.size(i) .stride(i) .scalar_type [link]
  • at::cuda::getCurrentCUDAStream [link]
  • C10_CUDA_KERNEL_LAUNCH_CHECK [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/sparse24/sparse24.cpp

  • TORCH_LIBRARY_FRAGMENT [link]
  • TORCH_SELECTIVE_SCHEMA [link]

xformers/csrc/sparse24/sparse24_largest_mask_2d.cu

  • is_cuda, is_sparse, is_contiguous, dim [link]
  • at::cuda::CUDAGuard [link]
  • at::cuda::getCurrentCUDAStream [link]
  • at::empty_like [link]
  • .size(i) [link]
  • i.scalar_type() [link]
  • at::ScalarType::Type [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/sparse24/sparse24_pack.cu

  • at::cuda::CUDAGuard [link]
  • CUDAGuard.emplace(device) [link]
  • t.device() [link]
  • stride(i), size(i) [link]
  • at::empty(..., TensorOptions.dtype(...)) [link] ← might not be possible to use empty_like
  • at::cuda::getCurrentCUDAStream() [link]
  • C10_CUDA_KERNEL_LAUNCH_CHECK [link]
  • c10::impl::ExcludeDispatchKeyGuard [link]
  • at::autocast::get_autocast_dtype [link]
  • at::autocast::cached_cast [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/sparse24/sparse24_pack_test.cu

  • at::PackedTensorAccessor [link]
  • scalar_Type, dim, size(i) [link]
  • at::cuda::CUDAGuard [link]
  • at::cuda::getCurrentCUDAStream [link]
  • at::zeros(..., TensorOptions) [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/swiglu/swiglu_op.cpp

  • dual_gemm_silu_identity_mul_META → rewrite in python [link]
  • sym_size [link]
  • sym_stride [link]
  • empty_symint [link]
  • TORCH_SELECTIVE_SCHEMA [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]
  • at::Tag::needs_fixed_stride_order [link]

xformers/csrc/swiglu/swiglu_packedw.cpp

  • c10::Dispatcher::singleton().findSchemaOrThrow.typed<>().call [link]
  • c10::IntArrayRef [link]
  • SwiGLUPackedWeights C++ autograd.Function → rewrite in python [link]

xformers/csrc/swiglu/cuda/dual_gemm_silu_identity_mul.cu

  • dim(), stride(i), size(i) [link]
  • at::cuda::CUDAGuard [link]
  • t.device() [link]
  • at::empty(..., TensorOptions) [link]
  • at::cuda::getCurrentCUDAStream [link]
  • at::cuda::getDeviceProperties [link]
  • Device.index() [link]
  • at::empty(..., TensorOptions.dtype(...)) [link]
  • c10::impl::ExcludeDispatchKeyGuard [link]
  • at::autocast::get_autocast_dtype [link]
  • at::autocast::cached_cast [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/swiglu/cuda/gemm_fused_operand_sum.cu

  • at::cuda::CUDAGuard [link]
  • t.device() [link]
  • at::cuda::getCurrentCUDAStream [link]
  • at::cuda::getDeviceProperties [link]
  • Device.index() [link]
  • .stride(i), .dim, .scalar_type [link]
  • c10::impl::ExcludeDispatchKeyGuard [link]
  • at::autocast::get_autocast_dtype [link]
  • at::autocast::cached_cast [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

xformers/csrc/swiglu/cuda/silu_bw_fused.cu

  • silu_bw_fused → rewrite meta fn in python [link]
  • Tensor.select [link]
  • AT_DISPATCH_FLOATING_TYPES_AND2 [link]
  • at::ScalarType::Type [link]
  • at::TensorIteratorConfig [link]
  • at::AccumulateType [link]
  • at::native::gpu_kernel_multiple_outputs [link]
  • c10::impl::ExcludeDispatchKeyGuard [link]
  • at::autocast::get_autocast_dtype [link]
  • at::autocast::cached_cast [link]
  • TORCH_SELECTIVE_NAME [link]
  • TORCH_FN [link]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions