[ET-VK] Fix exponential blowup in tag_memory_meta_pass repset tracing#18263
[ET-VK] Fix exponential blowup in tag_memory_meta_pass repset tracing#18263
Conversation
Pull Request resolved: #18207 The trace_node_users_to_constrain_repset DFS previously tracked search depth as a per-branch int counter, allowing each branch of a fan-out to independently explore up to max_trace_search_depth nodes. In transformer-style graphs with heavy fan-out this caused exponential blowup in the number of nodes visited. Replace the int counter with a mutable list containing a single int that is shared by reference across all recursive branches. This limits the TOTAL number of nodes explored per top-level trace call to max_trace_search_depth (16), regardless of fan-out structure. Authored with Claude. ghstack-source-id: 353546691 @exported-using-ghexport Differential Revision: [D96790445](https://our.internmc.facebook.com/intern/diff/D96790445/)
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18263
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 49 PendingAs of commit e198bd4 with merge base 22174fa ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
…epSetList Pull Request resolved: #18048 The `__getitem__` methods of `DtypeSetList` and `TensorRepSetList` in `utils.py` could raise an `IndexError` when the index is greater than or equal to the length of the list. This can happen when partitioning ops whose number of inputs or outputs exceeds the number of entries in the dtype/tensor-rep specification list. Fix by returning an empty set in this case, matching the intent of the existing broadcasting logic. ghstack-source-id: 353546684 @exported-using-ghexport Differential Revision: [D95970163](https://our.internmc.facebook.com/intern/diff/D95970163/)
Pull Request resolved: #18049 Add Vulkan build support for the Parakeet runner: llm-debug-vulkan preset in root CMakePresets.json, parakeet-vulkan presets in the Parakeet CMakePresets.json, vulkan_backend linkage in CMakeLists.txt, and a `make parakeet-vulkan` Makefile target. Add _create_vulkan_partitioners() and wire it into lower_to_executorch() so that `--backend vulkan` is accepted by export_parakeet_tdt.py. ghstack-source-id: 353546680 @exported-using-ghexport Differential Revision: [D95970157](https://our.internmc.facebook.com/intern/diff/D95970157/)
…teGraph Fix output argument indexing in VulkanBackend::execute() and extend ComputeGraph to transparently handle symint values. The output loop previously computed the args index as `i + num_inputs`, which breaks when non-tensor arguments (e.g. symints) sit between the tensor inputs and outputs in the args array. Fix by computing the offset from the end: `args.size() - num_outputs`. ComputeGraph changes add symint support so that operators can read symint values uniformly: - `extract_scalar<T>()` now handles SymInt values, allowing operators to call extract_scalar on arguments that may be either plain ints or symints without special-casing. - `read_symint()` falls back to reading plain Int values, so values stored as Int (rather than SymInt objects) can be read uniformly. Pull Request resolved: #18050 ghstack-source-id: 353546683 @exported-using-ghexport Differential Revision: [D95970167](https://our.internmc.facebook.com/intern/diff/D95970167/)
Modernize constant_pad_nd to support ANY_STORAGE (both buffer and texture). Migrate shaders to BufferMetadata/TextureMetadata with indexing.glslh and unify dispatch into a single add_constant_pad_nd_node function using DynamicDispatchNode. Pull Request resolved: #18051 ghstack-source-id: 353546682 @exported-using-ghexport Differential Revision: [D95970168](https://our.internmc.facebook.com/intern/diff/D95970168/)
Modernize arange and full operators to support ANY_STORAGE. Add separate buffer and texture shader variants using BufferMetadata/TextureMetadata with indexing.glslh. Unify dispatch with add_storage_type_suffix and DynamicDispatchNode. Add symint support via read_symint_list for dynamic output sizes. Pull Request resolved: #18052 ghstack-source-id: 353546693 @exported-using-ghexport Differential Revision: [D95970169](https://our.internmc.facebook.com/intern/diff/D95970169/)
Modernize expand_copy to support ANY_STORAGE. Add buffer shader variant using BufferMetadata with indexing.glslh. Unify dispatch with add_storage_type_suffix and DynamicDispatchNode. Add resize function and symint support for dynamic target sizes. Pull Request resolved: #18053 ghstack-source-id: 353546690 @exported-using-ghexport Differential Revision: [D95970162](https://our.internmc.facebook.com/intern/diff/D95970162/)
Modernize softmax and log_softmax to support ANY_STORAGE. Migrate both buffer and texture shaders from indexing_utils.h to indexing.glslh with BufferMetadata/TextureMetadata UBOs. Merge separate texture and buffer dispatch functions into a unified add_softmax_node using add_storage_type_suffix and graph.meta_ubo(). Pull Request resolved: #18054 ghstack-source-id: 353546688 @exported-using-ghexport Differential Revision: [D95970171](https://our.internmc.facebook.com/intern/diff/D95970171/)
Modernize native_layer_norm to support ANY_STORAGE. Migrate texture shader from indexing_utils.h to indexing.glslh with TextureMetadata UBOs. Merge separate texture and buffer dispatch functions into a unified add_native_layer_norm_node using graph.meta_ubo(). Buffer path retains custom workgroup sizing for cooperative shared-memory reduction. Pull Request resolved: #18055 ghstack-source-id: 353546686 @exported-using-ghexport Differential Revision: [D95970158](https://our.internmc.facebook.com/intern/diff/D95970158/)
Modernize repeat to support ANY_STORAGE. Rewrite texture shader to use TextureMetadata with indexing.glslh helpers for coordinate conversion. Add buffer shader variant using BufferMetadata. Unify dispatch to use graph.meta_ubo() for both paths. Add symint support for dynamic repeat counts. Pull Request resolved: #18056 ghstack-source-id: 353546685 @exported-using-ghexport Differential Revision: [D95970170](https://our.internmc.facebook.com/intern/diff/D95970170/)
Modernize embedding to support ANY_STORAGE. Add buffer and texture shader variants using BufferMetadata/TextureMetadata with indexing.glslh. Unify new dispatch path with add_storage_type_suffix and graph.meta_ubo(). Legacy channels-packed texture path retained for backward compatibility. Pull Request resolved: #18057 ghstack-source-id: 353546689 @exported-using-ghexport Differential Revision: [D95970161](https://our.internmc.facebook.com/intern/diff/D95970161/)
Modernize argmax and argmin to support ANY_STORAGE via the add_reduce_per_row_node dispatch path. Buffer shader uses BufferMetadata with indexing.glslh. Custom workgroup sizing retained for cooperative row-reduction algorithm with shared memory. Pull Request resolved: #18058 ghstack-source-id: 353546687 @exported-using-ghexport Differential Revision: [D95970165](https://our.internmc.facebook.com/intern/diff/D95970165/)
Pull Request resolved: #18059 Add missing operators needed for Parakeet TDT model support: - New symint ops: sym_sub, sym_floordiv, sym_mul in SymIntOps.cpp; register operator.floordiv and operator.mul as ephemeral ops in op_registry.py - New tensor ops: bitwise_not (via unary_op shader with uint8 DTYPE), logical_and (alias for bitwise_and dispatch) - Improve _to_copy: expand dtype support to FP_INT_BOOL_T and use pick_io_storage_fn to restrict to CONTIGUOUS_BUFFER for non-fp conversions - Fix where resize: compute output shape via broadcast across all tensor inputs instead of always using the second input's shape - Add symint support to split: use extract_int_or_symint_list instead of get_int_list in resize_split_node and split_with_sizes_copy_default - Mark scalar_tensor as supporting resize ghstack-source-id: 353546692 @exported-using-ghexport Differential Revision: [D95970159](https://our.internmc.facebook.com/intern/diff/D95970159/)
…linear ops Pull Request resolved: #18061 Wire bias through the q4gsw and dq8ca_q4gsw quantized linear operators. Add add_bias_to_out_tile() helper in the output tile computation header and call it from all three shader variants (tiled, coop, dq8ca_tiled). Remove the bias guard in the pattern matcher to allow biased linear layers. ghstack-source-id: 353546681 @exported-using-ghexport Differential Revision: [D95970172](https://our.internmc.facebook.com/intern/diff/D95970172/)
This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #18207 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/490/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/490/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/main
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/490/orig
Differential Revision: D96790445
@diff-train-skip-merge