Skip to content

[ET-VK][ez] Fix IndexError in Vulkan partitioner DtypeSetList/TensorRepSetList#18264

Merged
SS-JIA merged 13 commits intogh/SS-JIA/490/origfrom
gh/SS-JIA/465/orig
Mar 18, 2026
Merged

[ET-VK][ez] Fix IndexError in Vulkan partitioner DtypeSetList/TensorRepSetList#18264
SS-JIA merged 13 commits intogh/SS-JIA/490/origfrom
gh/SS-JIA/465/orig

Conversation

@pytorchbot
Copy link
Collaborator

@pytorchbot pytorchbot commented Mar 18, 2026

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #18048 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/465/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/465/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/490/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/465/orig
Differential Revision: D95970163
@diff-train-skip-merge

cc @SS-JIA @manuelcandales @digantdesai @cbilgin

…epSetList

Pull Request resolved: #18048

The `__getitem__` methods of `DtypeSetList` and `TensorRepSetList` in
`utils.py` could raise an `IndexError` when the index is greater than or
equal to the length of the list. This can happen when partitioning ops
whose number of inputs or outputs exceeds the number of entries in the
dtype/tensor-rep specification list. Fix by returning an empty set in
this case, matching the intent of the existing broadcasting logic.
ghstack-source-id: 353546684
@exported-using-ghexport

Differential Revision: [D95970163](https://our.internmc.facebook.com/intern/diff/D95970163/)
@pytorchbot pytorchbot requested a review from SS-JIA as a code owner March 18, 2026 01:46
@pytorch-bot pytorch-bot bot added the module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ label Mar 18, 2026
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 18, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18264

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 18, 2026
ssjia added 12 commits March 17, 2026 21:54
Pull Request resolved: #18049

Add Vulkan build support for the Parakeet runner: llm-debug-vulkan preset
in root CMakePresets.json, parakeet-vulkan presets in the Parakeet
CMakePresets.json, vulkan_backend linkage in CMakeLists.txt, and a
`make parakeet-vulkan` Makefile target.

Add _create_vulkan_partitioners() and wire it into lower_to_executorch()
so that `--backend vulkan` is accepted by export_parakeet_tdt.py.
ghstack-source-id: 353546680
@exported-using-ghexport

Differential Revision: [D95970157](https://our.internmc.facebook.com/intern/diff/D95970157/)
…teGraph

Fix output argument indexing in VulkanBackend::execute() and extend
ComputeGraph to transparently handle symint values.

The output loop previously computed the args index as `i + num_inputs`,
which breaks when non-tensor arguments (e.g. symints) sit between the
tensor inputs and outputs in the args array. Fix by computing the offset
from the end: `args.size() - num_outputs`.

ComputeGraph changes add symint support so that operators can read symint
values uniformly:
- `extract_scalar<T>()` now handles SymInt values, allowing operators to
  call extract_scalar on arguments that may be either plain ints or
  symints without special-casing.
- `read_symint()` falls back to reading plain Int values, so values
  stored as Int (rather than SymInt objects) can be read uniformly.

Pull Request resolved: #18050
ghstack-source-id: 353546683
@exported-using-ghexport

Differential Revision: [D95970167](https://our.internmc.facebook.com/intern/diff/D95970167/)
Modernize constant_pad_nd to support ANY_STORAGE (both buffer and
texture). Migrate shaders to BufferMetadata/TextureMetadata with
indexing.glslh and unify dispatch into a single add_constant_pad_nd_node
function using DynamicDispatchNode.

Pull Request resolved: #18051
ghstack-source-id: 353546682
@exported-using-ghexport

Differential Revision: [D95970168](https://our.internmc.facebook.com/intern/diff/D95970168/)
Modernize arange and full operators to support ANY_STORAGE. Add separate
buffer and texture shader variants using BufferMetadata/TextureMetadata
with indexing.glslh. Unify dispatch with add_storage_type_suffix and
DynamicDispatchNode. Add symint support via read_symint_list for dynamic
output sizes.

Pull Request resolved: #18052
ghstack-source-id: 353546693
@exported-using-ghexport

Differential Revision: [D95970169](https://our.internmc.facebook.com/intern/diff/D95970169/)
Modernize expand_copy to support ANY_STORAGE. Add buffer shader variant
using BufferMetadata with indexing.glslh. Unify dispatch with
add_storage_type_suffix and DynamicDispatchNode. Add resize function and
symint support for dynamic target sizes.

Pull Request resolved: #18053
ghstack-source-id: 353546690
@exported-using-ghexport

Differential Revision: [D95970162](https://our.internmc.facebook.com/intern/diff/D95970162/)
Modernize softmax and log_softmax to support ANY_STORAGE. Migrate both
buffer and texture shaders from indexing_utils.h to indexing.glslh with
BufferMetadata/TextureMetadata UBOs. Merge separate texture and buffer
dispatch functions into a unified add_softmax_node using
add_storage_type_suffix and graph.meta_ubo().

Pull Request resolved: #18054
ghstack-source-id: 353546688
@exported-using-ghexport

Differential Revision: [D95970171](https://our.internmc.facebook.com/intern/diff/D95970171/)
Modernize native_layer_norm to support ANY_STORAGE. Migrate texture
shader from indexing_utils.h to indexing.glslh with TextureMetadata
UBOs. Merge separate texture and buffer dispatch functions into a
unified add_native_layer_norm_node using graph.meta_ubo(). Buffer
path retains custom workgroup sizing for cooperative shared-memory
reduction.

Pull Request resolved: #18055
ghstack-source-id: 353546686
@exported-using-ghexport

Differential Revision: [D95970158](https://our.internmc.facebook.com/intern/diff/D95970158/)
Modernize repeat to support ANY_STORAGE. Rewrite texture shader to use
TextureMetadata with indexing.glslh helpers for coordinate conversion.
Add buffer shader variant using BufferMetadata. Unify dispatch to use
graph.meta_ubo() for both paths. Add symint support for dynamic repeat
counts.

Pull Request resolved: #18056
ghstack-source-id: 353546685
@exported-using-ghexport

Differential Revision: [D95970170](https://our.internmc.facebook.com/intern/diff/D95970170/)
Modernize embedding to support ANY_STORAGE. Add buffer and texture
shader variants using BufferMetadata/TextureMetadata with indexing.glslh.
Unify new dispatch path with add_storage_type_suffix and
graph.meta_ubo(). Legacy channels-packed texture path retained for
backward compatibility.

Pull Request resolved: #18057
ghstack-source-id: 353546689
@exported-using-ghexport

Differential Revision: [D95970161](https://our.internmc.facebook.com/intern/diff/D95970161/)
Modernize argmax and argmin to support ANY_STORAGE via the
add_reduce_per_row_node dispatch path. Buffer shader uses
BufferMetadata with indexing.glslh. Custom workgroup sizing retained
for cooperative row-reduction algorithm with shared memory.

Pull Request resolved: #18058
ghstack-source-id: 353546687
@exported-using-ghexport

Differential Revision: [D95970165](https://our.internmc.facebook.com/intern/diff/D95970165/)
Pull Request resolved: #18059

Add missing operators needed for Parakeet TDT model support:

- New symint ops: sym_sub, sym_floordiv, sym_mul in SymIntOps.cpp;
  register operator.floordiv and operator.mul as ephemeral ops in
  op_registry.py
- New tensor ops: bitwise_not (via unary_op shader with uint8 DTYPE),
  logical_and (alias for bitwise_and dispatch)
- Improve _to_copy: expand dtype support to FP_INT_BOOL_T and use
  pick_io_storage_fn to restrict to CONTIGUOUS_BUFFER for non-fp
  conversions
- Fix where resize: compute output shape via broadcast across all tensor
  inputs instead of always using the second input's shape
- Add symint support to split: use extract_int_or_symint_list instead of
  get_int_list in resize_split_node and split_with_sizes_copy_default
- Mark scalar_tensor as supporting resize
ghstack-source-id: 353546692
@exported-using-ghexport

Differential Revision: [D95970159](https://our.internmc.facebook.com/intern/diff/D95970159/)
…linear ops

Pull Request resolved: #18061

Wire bias through the q4gsw and dq8ca_q4gsw quantized linear operators.
Add add_bias_to_out_tile() helper in the output tile computation header and call
it from all three shader variants (tiled, coop, dq8ca_tiled). Remove the bias
guard in the pattern matcher to allow biased linear layers.
ghstack-source-id: 353546681
@exported-using-ghexport

Differential Revision: [D95970172](https://our.internmc.facebook.com/intern/diff/D95970172/)
@SS-JIA SS-JIA merged commit e198bd4 into gh/SS-JIA/490/orig Mar 18, 2026
14 checks passed
@SS-JIA SS-JIA deleted the gh/SS-JIA/465/orig branch March 18, 2026 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants