[ET-VK] Fix exponential blowup in tag_memory_meta_pass repset tracing by pytorchbot · Pull Request #18263 · pytorch/executorch

pytorchbot · 2026-03-18T01:46:49Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #18207 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/490/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/490/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/490/orig
Differential Revision: D96790445
@diff-train-skip-merge

Pull Request resolved: #18207 The trace_node_users_to_constrain_repset DFS previously tracked search depth as a per-branch int counter, allowing each branch of a fan-out to independently explore up to max_trace_search_depth nodes. In transformer-style graphs with heavy fan-out this caused exponential blowup in the number of nodes visited. Replace the int counter with a mutable list containing a single int that is shared by reference across all recursive branches. This limits the TOTAL number of nodes explored per top-level trace call to max_trace_search_depth (16), regardless of fan-out structure. Authored with Claude. ghstack-source-id: 353546691 @exported-using-ghexport Differential Revision: [D96790445](https://our.internmc.facebook.com/intern/diff/D96790445/)

pytorch-bot · 2026-03-18T01:46:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18263

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 49 Pending

As of commit e198bd4 with merge base 22174fa ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-18T01:47:28Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

…epSetList Pull Request resolved: #18048 The `__getitem__` methods of `DtypeSetList` and `TensorRepSetList` in `utils.py` could raise an `IndexError` when the index is greater than or equal to the length of the list. This can happen when partitioning ops whose number of inputs or outputs exceeds the number of entries in the dtype/tensor-rep specification list. Fix by returning an empty set in this case, matching the intent of the existing broadcasting logic. ghstack-source-id: 353546684 @exported-using-ghexport Differential Revision: [D95970163](https://our.internmc.facebook.com/intern/diff/D95970163/)

Pull Request resolved: #18049 Add Vulkan build support for the Parakeet runner: llm-debug-vulkan preset in root CMakePresets.json, parakeet-vulkan presets in the Parakeet CMakePresets.json, vulkan_backend linkage in CMakeLists.txt, and a `make parakeet-vulkan` Makefile target. Add _create_vulkan_partitioners() and wire it into lower_to_executorch() so that `--backend vulkan` is accepted by export_parakeet_tdt.py. ghstack-source-id: 353546680 @exported-using-ghexport Differential Revision: [D95970157](https://our.internmc.facebook.com/intern/diff/D95970157/)

…teGraph Fix output argument indexing in VulkanBackend::execute() and extend ComputeGraph to transparently handle symint values. The output loop previously computed the args index as `i + num_inputs`, which breaks when non-tensor arguments (e.g. symints) sit between the tensor inputs and outputs in the args array. Fix by computing the offset from the end: `args.size() - num_outputs`. ComputeGraph changes add symint support so that operators can read symint values uniformly: - `extract_scalar<T>()` now handles SymInt values, allowing operators to call extract_scalar on arguments that may be either plain ints or symints without special-casing. - `read_symint()` falls back to reading plain Int values, so values stored as Int (rather than SymInt objects) can be read uniformly. Pull Request resolved: #18050 ghstack-source-id: 353546683 @exported-using-ghexport Differential Revision: [D95970167](https://our.internmc.facebook.com/intern/diff/D95970167/)

Modernize constant_pad_nd to support ANY_STORAGE (both buffer and texture). Migrate shaders to BufferMetadata/TextureMetadata with indexing.glslh and unify dispatch into a single add_constant_pad_nd_node function using DynamicDispatchNode. Pull Request resolved: #18051 ghstack-source-id: 353546682 @exported-using-ghexport Differential Revision: [D95970168](https://our.internmc.facebook.com/intern/diff/D95970168/)

Modernize arange and full operators to support ANY_STORAGE. Add separate buffer and texture shader variants using BufferMetadata/TextureMetadata with indexing.glslh. Unify dispatch with add_storage_type_suffix and DynamicDispatchNode. Add symint support via read_symint_list for dynamic output sizes. Pull Request resolved: #18052 ghstack-source-id: 353546693 @exported-using-ghexport Differential Revision: [D95970169](https://our.internmc.facebook.com/intern/diff/D95970169/)

Modernize expand_copy to support ANY_STORAGE. Add buffer shader variant using BufferMetadata with indexing.glslh. Unify dispatch with add_storage_type_suffix and DynamicDispatchNode. Add resize function and symint support for dynamic target sizes. Pull Request resolved: #18053 ghstack-source-id: 353546690 @exported-using-ghexport Differential Revision: [D95970162](https://our.internmc.facebook.com/intern/diff/D95970162/)

Modernize softmax and log_softmax to support ANY_STORAGE. Migrate both buffer and texture shaders from indexing_utils.h to indexing.glslh with BufferMetadata/TextureMetadata UBOs. Merge separate texture and buffer dispatch functions into a unified add_softmax_node using add_storage_type_suffix and graph.meta_ubo(). Pull Request resolved: #18054 ghstack-source-id: 353546688 @exported-using-ghexport Differential Revision: [D95970171](https://our.internmc.facebook.com/intern/diff/D95970171/)

Modernize native_layer_norm to support ANY_STORAGE. Migrate texture shader from indexing_utils.h to indexing.glslh with TextureMetadata UBOs. Merge separate texture and buffer dispatch functions into a unified add_native_layer_norm_node using graph.meta_ubo(). Buffer path retains custom workgroup sizing for cooperative shared-memory reduction. Pull Request resolved: #18055 ghstack-source-id: 353546686 @exported-using-ghexport Differential Revision: [D95970158](https://our.internmc.facebook.com/intern/diff/D95970158/)

Modernize repeat to support ANY_STORAGE. Rewrite texture shader to use TextureMetadata with indexing.glslh helpers for coordinate conversion. Add buffer shader variant using BufferMetadata. Unify dispatch to use graph.meta_ubo() for both paths. Add symint support for dynamic repeat counts. Pull Request resolved: #18056 ghstack-source-id: 353546685 @exported-using-ghexport Differential Revision: [D95970170](https://our.internmc.facebook.com/intern/diff/D95970170/)

Modernize embedding to support ANY_STORAGE. Add buffer and texture shader variants using BufferMetadata/TextureMetadata with indexing.glslh. Unify new dispatch path with add_storage_type_suffix and graph.meta_ubo(). Legacy channels-packed texture path retained for backward compatibility. Pull Request resolved: #18057 ghstack-source-id: 353546689 @exported-using-ghexport Differential Revision: [D95970161](https://our.internmc.facebook.com/intern/diff/D95970161/)

Modernize argmax and argmin to support ANY_STORAGE via the add_reduce_per_row_node dispatch path. Buffer shader uses BufferMetadata with indexing.glslh. Custom workgroup sizing retained for cooperative row-reduction algorithm with shared memory. Pull Request resolved: #18058 ghstack-source-id: 353546687 @exported-using-ghexport Differential Revision: [D95970165](https://our.internmc.facebook.com/intern/diff/D95970165/)

Pull Request resolved: #18059 Add missing operators needed for Parakeet TDT model support: - New symint ops: sym_sub, sym_floordiv, sym_mul in SymIntOps.cpp; register operator.floordiv and operator.mul as ephemeral ops in op_registry.py - New tensor ops: bitwise_not (via unary_op shader with uint8 DTYPE), logical_and (alias for bitwise_and dispatch) - Improve _to_copy: expand dtype support to FP_INT_BOOL_T and use pick_io_storage_fn to restrict to CONTIGUOUS_BUFFER for non-fp conversions - Fix where resize: compute output shape via broadcast across all tensor inputs instead of always using the second input's shape - Add symint support to split: use extract_int_or_symint_list instead of get_int_list in resize_split_node and split_with_sizes_copy_default - Mark scalar_tensor as supporting resize ghstack-source-id: 353546692 @exported-using-ghexport Differential Revision: [D95970159](https://our.internmc.facebook.com/intern/diff/D95970159/)

…linear ops Pull Request resolved: #18061 Wire bias through the q4gsw and dq8ca_q4gsw quantized linear operators. Add add_bias_to_out_tile() helper in the output tile computation header and call it from all three shader variants (tiled, coop, dq8ca_tiled). Remove the bias guard in the pattern matcher to allow biased linear layers. ghstack-source-id: 353546681 @exported-using-ghexport Differential Revision: [D95970172](https://our.internmc.facebook.com/intern/diff/D95970172/)

pytorchbot requested a review from SS-JIA as a code owner March 18, 2026 01:46

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 18, 2026

ssjia added 13 commits March 17, 2026 21:54

SS-JIA requested review from kirklandsign, larryliu0820 and lucylq as code owners March 18, 2026 01:55

SS-JIA approved these changes Mar 18, 2026

View reviewed changes

SS-JIA merged commit 1f0e737 into main Mar 18, 2026
129 of 142 checks passed

SS-JIA deleted the gh/SS-JIA/490/orig branch March 18, 2026 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ET-VK] Fix exponential blowup in tag_memory_meta_pass repset tracing#18263

[ET-VK] Fix exponential blowup in tag_memory_meta_pass repset tracing#18263
SS-JIA merged 14 commits intomainfrom
gh/SS-JIA/490/orig

pytorchbot commented Mar 18, 2026

Uh oh!

pytorch-bot bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pytorchbot commented Mar 18, 2026

Uh oh!

pytorch-bot bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18263

⏳ No Failures, 49 Pending

Uh oh!

github-actions bot commented Mar 18, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Mar 18, 2026 •

edited

Loading

This PR needs a `release notes:` label