Replace remaining uses of linearize_access_indexes for broadcasting with BroadcastIndexesRange #8965
Labels
module: kernels
Issues related to kernel libraries and utilities, and code under kernels/
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
BroadcastIndexesRange is more efficient than linearize_access_indexes-based loops. We have several remaining places that use linearize_access_indexes, and what they have in common is that they're not looping over an existing Tensor's .sizes() and .strides(). To make these work nicely, we need a way for BroadcastIndexesRange to play nicely with .sizes() and .strides() ArrayRefs that aren't attached to a Tensor.
Specific usage sites that can be improved:
executorch/kernels/portable/cpu/op_cdist_forward.cpp
Line 79 in 95f779a
https://github.com/pytorch/executorch/blob/95f779ae2120d94e20bb95ae6af45da76ce3ff52/kernels/portable/cpu/op_index_put.cpp#L145.(I think?)
executorch/kernels/portable/cpu/op_split_with_sizes_copy.cpp
Line 129 in 95f779a
executorch/kernels/portable/cpu/op_masked_select.cpp
Line 125 in 95f779a
cc @larryliu0820 @manuelcandales
The text was updated successfully, but these errors were encountered: