[ExecuTorch] Add broadcast support for optimized add op #8205

kimishpatel · 2025-02-05T01:06:35Z

Stack from ghstack (oldest at bottom):

Summary:
This brings add op to feature parity, wrt, broadcasting, to mul op in
optimized kernels lib

Test Plan:
tests added

Reviewers:

Subscribers:

Tasks:

Tags:

cc @larryliu0820 @manuelcandales

Differential Revision: D69491814

Summary: Refactoring broadcast handling utils that were added for op_mul. This is in prepartion use these utils to handle broadcast for other ops such as add, sub, div. Plus remove a redundant test Test Plan: optimized_kernels_test in CI Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

pytorch-bot · 2025-02-05T01:06:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8205

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 6f2f01a with merge base 8148603 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / android / run-emulator (gh) (trunk failure)
The process '/usr/bin/sh' failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e4dea30 Pull Request resolved: #8205

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

swolchok · 2025-02-06T18:49:27Z

kernels/optimized/cpu/binary_ops.h

 Tensor& handle_last_dim_broadcast_elementwise(
    KernelRuntimeContext& ctx,
    const Op& vec_fun,
    const Tensor& a,
    const Tensor& b,
    Tensor& out,
-    const ElementwiseOptimizedPath selected_optimized_path) {
+    const ElementwiseOptimizedPath selected_optimized_path,
+    executorch::aten::optional<Scalar>& alpha = {}) {


error messages are telling you this needs to be a const ref. also why is this not std::optional

yeah just realized that. not sure why it did not throw error in local build but different compile options i guess.

I just followed what i see elsewhere. Happy to switch to std::optional too which is what I guess is backing that but maybe for aten build it aliases c10:::optional? Let me check that first

c10::optional is gone

swolchok · 2025-02-06T18:51:08Z

kernels/optimized/cpu/binary_ops.h

+    CTYPE alpha_val;
+    Vec alpha_val_vec(alpha_val);


normally I would say "alpha_val needs to be initialized; C++ doesn't have default zero-initialization for primitives", but actually the problem here is that alpha_val needs to move under the if:

Vec alpha_val_vec; if (alpha.has_value()) { CTYPE alpha_val; ET_KERNEL_CHECK(...)

swolchok · 2025-02-06T18:52:33Z

kernels/optimized/cpu/binary_ops.h

 Tensor& handle_broadcast_elementwise(
    KernelRuntimeContext& ctx,
    const Op& vec_fun,
    const Tensor& a,
    const Tensor& b,
    Tensor& out,
-    const ElementwiseOptimizedPath selected_optimized_path) {
+    const ElementwiseOptimizedPath selected_optimized_path,
+    executorch::aten::optional<Scalar> alpha = {}) {


why is this by-value but the other one is a reference? make consistent

oh good call out. my bad

swolchok · 2025-02-06T18:52:54Z

kernels/optimized/cpu/binary_ops.h

-        inner_size);
+  ET_SWITCH_REALB_TYPES(out_type, ctx, internal::BinaryOpTypeName<op_type>::kName, CTYPE, [&]() {
+    using Vec = executorch::vec::Vectorized<CTYPE>;
+    CTYPE alpha_val;


same problem as above

swolchok · 2025-02-06T18:53:55Z

kernels/optimized/cpu/op_add.cpp

+      // This behavior is a bit confusing.
+      // Reason we swap out args here is because handle_broadcast_elementwise
+      // handles this selected_optimized_path option a bit differently.
+      // This should really be resoled in handle_broadcast_elementwise.


s/resoled/resolved/

swolchok · 2025-02-06T19:16:50Z

kernels/optimized/cpu/op_add.cpp

+            ElementwiseOptimizedPath::kBroadcastLastDimReverseArguments ||
+        selected_optimized_path ==
+            ElementwiseOptimizedPath::kBroadcastNdByNdReverseArguments) {
+      // This behavior is a bit confusing.


I don't understand what's confusing here; there is an argument that should be scaled by alpha_val, we have to scale the right one. definitely don't think handle_broadcast_elementwise should be coupled to the specific op.

problem is this. All the reverse arg stuff has specifically different handlking inside handle_broadcast_elementwise. But that handling, inside handle_broadcast_elementwise, to work this change is necessary which makes them coupled and fragile

I guess confusing is not the right word here though.

swolchok · 2025-02-06T19:18:16Z

kernels/optimized/cpu/op_mul.cpp

@@ -130,8 +130,12 @@ Tensor& opt_mul_out(
          out.numel());
    });
  } else if (selected_optimized_path != ElementwiseOptimizedPath::kNone) {
-    auto mul_lambda = [](auto x, auto y) { return x * y; };
-    return torch::executor::handle_broadcast_elementwise(
+    // Reason for using alpha:


missing rest of comment after the colon

swolchok · 2025-02-06T19:18:19Z

kernels/optimized/cpu/op_mul.cpp

-    return torch::executor::handle_broadcast_elementwise(
+    // Reason for using alpha:
+    auto mul_lambda = [](auto x, auto y, auto alpha) {
+      (void)alpha;


https://en.cppreference.com/w/cpp/language/attributes/maybe_unused

thank you :)

swolchok · 2025-02-06T19:26:10Z

kernels/optimized/cpu/binary_ops.h

+template <BinaryOpType op_type>
+struct BinaryOpTypeName;
+
+template <>
+struct BinaryOpTypeName<BinaryOpType::kAdd> {
+  static constexpr char kName[] = "add.out";
+};
+
+template <>
+struct BinaryOpTypeName<BinaryOpType::kSub> {
+  static constexpr char kName[] = "sub.out";
+};
+
+template <>
+struct BinaryOpTypeName<BinaryOpType::kMul> {
+  static constexpr char kName[] = "mul.out";
+};
+
+template <>
+struct BinaryOpTypeName<BinaryOpType::kDiv> {
+  static constexpr char kName[] = "div.out";
+};
+


you don't need to do this. see existing example:

executorch/kernels/portable/cpu/op_rsub.cpp

Lines 50 to 55 in c82a7df

static constexpr const char op_name[] = "rsub.Scalar_out";

ET_SWITCH_REAL_TYPES(compute_type, ctx, op_name, CTYPE_COMPUTE, [&]() {

const CTYPE_COMPUTE val_b = utils::scalar_to<CTYPE_COMPUTE>(b);

const CTYPE_COMPUTE val_alpha = utils::scalar_to<CTYPE_COMPUTE>(alpha);

utils::apply_unitensor_elementwise_fn<CTYPE_COMPUTE, op_name>(

the secret sauce is that the string literal has to be a static constexpr const char [] and then you can pass it to a const char* template argument directly.

Thanks. I was hoping you would point me to something better for this

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

swolchok

just re-request review if my suggestion is bad :)

swolchok · 2025-02-07T16:10:51Z

kernels/optimized/cpu/op_add.cpp

+      // creation to handle_broadcast_elementwise and it be aware of which op is
+      // being executed.


not sure I agree, but we can settle that on a review of a proposed change

swolchok · 2025-02-07T16:20:50Z

kernels/optimized/cpu/binary_ops.h

+    using Vec = executorch::vec::Vectorized<CTYPE>;
+    Vec alpha_val_vec;
+    if (alpha.has_value()) {
+      CTYPE alpha_val;
+      ET_KERNEL_CHECK(
+          ctx,
+          native::utils::extract_scalar(alpha.value(), &alpha_val),
+          InvalidArgument, );
+      alpha_val_vec = Vec(alpha_val);
+    }
+    auto vec_fun_alpha = [vec_fun, alpha_val_vec](const Vec& a, const Vec& b) {
+      return vec_fun(a, b, alpha_val_vec);
+    };


oh, I see, you're having problems with the lambda because of this part. you can solve this by factoring the code differently.

the end result at the callsite could look something like

auto broadcast_op_plan_opt = plan_broadcast_elementwise(...); // broadcast_op_plan is a struct containing all the stuff you work out that isn't dependent on the dtype, like lhs, rhs. it does ET_KERNEL_CHECKs intenrally and returns nullopt if they fail. if (!broadcast_op_plan_opt) { // a check already failed return; } ET_SWITCH_REALB_TYPES(out_type, ctx, op_name, CTYPE, [&]() { auto alpha_val_vec_opt = extract_scalar_to_vector<CTYPE>(); // wrap up the bit that if (!alpha_val_vec_opt) { // awkward that this only returns from the lambda, but this is a generic ET_KERNEL_CHECK problem return; } auto add_lambda = [alpha_val_vec = *alpha_val_vec_opt](auto x, auto y) { return y + alpha_val * x; }; execute_broadcast_elementwise_plan<CTYPE>(*broadcast_op_plan_opt, add_lambda, ...); });

disclaimer: this is off the top of my head and it may be possible to unify some of this stuff with dtype_util.h for further simplification, though dtype_util is mostly intended to cut size/build time of portable ops

Good point. Let me see if I dont run into other issues to enable such refactor.

Ok so I looked refactor required. I think it is doable at the cost of moving ET_SWITCH_REALB_TYPES macros to the callsite in respective ops. Downside here is that now if you enable new dtype for optimized path, you have to change all the callsites.

So I am not fully convinced that it is better go down that route. But want to see whats your reasoning.

you have to change all the callsites.

that's just a matter of typing, right? if you plan to do it (I suppose optimizing Half/BFloat16 should be on our TODO list if the hardware supports the relevant instructions) and you really don't want to change 4-5 files later (you'll have to change them anyway for specifically Half/BFloat16 because there are opt-outs), you could always #define ET_SWITCH_OPTIMIZED_ELEMENTWISE_BROADCAST_OP_TYPES ET_SWITCH_REALB_TYPEs pre-emptively.

Ok thats fair. But is your reasoning for this change simpler code or you see perf impact.

I am not too stuck to it, so I will just go ahead and do it but wanted to understand your reasoning

simpler less repetitive code

ok will make the change but this will likely marginally increase size since now the whole handle_broadcast_elementwise function is dtype specialized

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

swolchok · 2025-02-11T21:48:41Z

kernels/test/op_add_test.cpp

+    Tensor a = tf_a.make(
+        {2, 2, 3, 5},
+        /*data=*/{1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+                  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
+                  31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
+                  46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60});
+    Tensor b = tf_a.make(
+        {2, 1, 3, 5},
+        /*data=*/{1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15,
+                  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30});


nit: it would probably be more reviewable to fill these programmatically, such as with std::iota, but certainly not blocking

swolchok · 2025-02-11T21:49:07Z

kernels/test/op_add_test.cpp

+        /*data=*/{2,  4,  6,  8,  10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30,
+                  17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45,
+                  47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75,
+                  62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90});


ditto programmatic fill

kimishpatel · 2025-02-11T23:36:50Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

kimishpatel · 2025-02-12T17:32:26Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

kimishpatel · 2025-02-12T20:56:36Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales Differential Revision: [D69491814](https://our.internmc.facebook.com/intern/diff/D69491814) [ghstack-poisoned]

kimishpatel · 2025-02-15T00:32:03Z

@kimishpatel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kimishpatel added 2 commits February 4, 2025 17:06

[ExecuTorch] Add broadcast support for optimized add op

dbe3e8a

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

kimishpatel mentioned this pull request Feb 5, 2025

[Executorch] Refactor op_mul's broadcasting utils #8204

Merged

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 5, 2025

kimishpatel mentioned this pull request Feb 5, 2025

Missing broadcast capability in optimized kernels. #8051

Closed

Update on "[ExecuTorch] Add broadcast support for optimized add op"

bf761db

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

This was referenced Feb 6, 2025

[Executorch] Refactor op_add to support op_sub broadcasting #8255

Merged

[Executorch] Add broadcasting support to optimized op_sub #8256

Merged

[ExecuTorch] Add broadcasting support to optimized op_div #8257

Merged

kimishpatel added 2 commits February 5, 2025 22:40

Update base for Update on "[ExecuTorch] Add broadcast support for opt…

0e1cfc7

…imized add op" Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Update on "[ExecuTorch] Add broadcast support for optimized add op"

0ce8fd7

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

kimishpatel requested a review from swolchok February 6, 2025 06:40

swolchok requested changes Feb 6, 2025

View reviewed changes

kimishpatel added module: kernels Issues related to kernel libraries and utilities, and code under kernels/ release notes: ops & kernels Changes to the opset and any new / changed kernel implementations labels Feb 7, 2025

kimishpatel added 4 commits February 6, 2025 20:05

Update on "[ExecuTorch] Add broadcast support for optimized add op"

7ea55eb

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

Update on "[ExecuTorch] Add broadcast support for optimized add op"

e9fe6af

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

swolchok requested changes Feb 7, 2025

View reviewed changes

kimishpatel added 2 commits February 10, 2025 16:06

Update on "[ExecuTorch] Add broadcast support for optimized add op"

a91eef8

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

kimishpatel requested a review from swolchok February 11, 2025 00:07

kimishpatel added 2 commits February 11, 2025 10:17

Update on "[ExecuTorch] Add broadcast support for optimized add op"

656873f

Summary: This brings add op to feature parity, wrt, broadcasting, to mul op in optimized kernels lib Test Plan: tests added Reviewers: Subscribers: Tasks: Tags: cc larryliu0820 manuelcandales [ghstack-poisoned]

swolchok approved these changes Feb 11, 2025

View reviewed changes

github-actions bot mentioned this pull request Feb 11, 2025

Weekly pr metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#6

Open

kimishpatel added 4 commits February 11, 2025 19:16

kimishpatel added 2 commits February 12, 2025 12:55

kimishpatel changed the base branch from gh/kimishpatel/154/base to main February 13, 2025 14:37

kimishpatel added 2 commits February 13, 2025 06:45

manuelcandales approved these changes Feb 13, 2025

View reviewed changes

kimishpatel added 5 commits February 13, 2025 09:15

Merge branch 'main' into gh/kimishpatel/154/head

5fb4107

kimishpatel merged commit b71d873 into main Feb 15, 2025
45 of 48 checks passed

kimishpatel deleted the gh/kimishpatel/154/head branch February 15, 2025 04:24

github-actions bot mentioned this pull request Feb 17, 2025

Weekly pr metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#8

Open

This was referenced Feb 24, 2025

Weekly pr metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#10

Open

Weekly pr metrics report - 2025-02-01..2025-02-07 wdvr/pytorch#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch] Add broadcast support for optimized add op #8205

[ExecuTorch] Add broadcast support for optimized add op #8205

kimishpatel commented Feb 5, 2025 •

edited

Loading

pytorch-bot bot commented Feb 5, 2025 •

edited

Loading

swolchok Feb 6, 2025

kimishpatel Feb 6, 2025

swolchok Feb 11, 2025

swolchok Feb 6, 2025

swolchok Feb 6, 2025

kimishpatel Feb 7, 2025

swolchok Feb 6, 2025

swolchok Feb 6, 2025

swolchok Feb 6, 2025

kimishpatel Feb 6, 2025

kimishpatel Feb 7, 2025

swolchok Feb 6, 2025

swolchok Feb 6, 2025

kimishpatel Feb 6, 2025

swolchok Feb 6, 2025

kimishpatel Feb 6, 2025

swolchok left a comment

swolchok Feb 7, 2025

swolchok Feb 7, 2025

kimishpatel Feb 7, 2025

kimishpatel Feb 10, 2025

swolchok Feb 10, 2025

kimishpatel Feb 10, 2025

swolchok Feb 10, 2025

kimishpatel Feb 10, 2025

swolchok Feb 11, 2025

swolchok Feb 11, 2025

kimishpatel commented Feb 11, 2025

kimishpatel commented Feb 12, 2025

kimishpatel commented Feb 12, 2025

kimishpatel commented Feb 15, 2025

	static constexpr const char op_name[] = "rsub.Scalar_out";

	ET_SWITCH_REAL_TYPES(compute_type, ctx, op_name, CTYPE_COMPUTE, [&]() {
	const CTYPE_COMPUTE val_b = utils::scalar_to<CTYPE_COMPUTE>(b);
	const CTYPE_COMPUTE val_alpha = utils::scalar_to<CTYPE_COMPUTE>(alpha);
	utils::apply_unitensor_elementwise_fn<CTYPE_COMPUTE, op_name>(

		// creation to handle_broadcast_elementwise and it be aware of which op is
		// being executed.

[ExecuTorch] Add broadcast support for optimized add op #8205

[ExecuTorch] Add broadcast support for optimized add op #8205

Conversation

kimishpatel commented Feb 5, 2025 • edited Loading

pytorch-bot bot commented Feb 5, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8205

✅ You can merge normally! (1 Unrelated Failure)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swolchok left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kimishpatel commented Feb 11, 2025

kimishpatel commented Feb 12, 2025

kimishpatel commented Feb 12, 2025

kimishpatel commented Feb 15, 2025

kimishpatel commented Feb 5, 2025 •

edited

Loading

pytorch-bot bot commented Feb 5, 2025 •

edited

Loading