How to link custom ops? #4510

BlackSamorez · 2024-08-01T21:16:01Z

Hi!

I'm trying to integrate some of quantized MatMul C++ kernels into Executorch and I'm having a bad time: the documentation is very vague about what exactly I need to include/link for ATen to pick up my ops.

I would greatly appreciate any help in trying to make it work.

Overview:

Source code for the dynamic library containing the ops consists of 3 files: lut_kernel.h, lut_kernel.cpp, lut_kernel_pytorch.cpp. The files contain roughly this code:

// lut_kernel.h
#pragma once

#include <executorch/runtime/kernel/kernel_includes.h>

namespace torch {
namespace executor {

namespace native {

Tensor& code2x8_lut_matmat_out(
  RuntimeContext& ctx,
  const Tensor& input,
  const Tensor& codes,
  const Tensor& codebooks,
  const Tensor& scales,
  const optional<Tensor>& bias,
  Tensor& out
);
} // namespace native
} // namespace executor
} // namespace torch

// lut_kernel.cpp
#include "lut_kernel.h"

#include <executorch/extension/kernel_util/make_boxed_from_unboxed_functor.h>

namespace torch {
  namespace executor {
    namespace native {
      Tensor& code2x8_lut_matmat_out(
        RuntimeContext& ctx,
        const Tensor& input,
        const Tensor& codes,
        const Tensor& codebooks,
        const Tensor& scales,
        const optional<Tensor>& bias,
        Tensor& out
      ) {
        // CALCULATIONS
        return out;
      }
    } // namespace native
  } // namespace executor
} // namespace torch

EXECUTORCH_LIBRARY(aqlm, "code2x8_lut_matmat.out", torch::executor::native::code2x8_lut_matmat_out);

// lut_kernel_pytorch.cpp
#include "lut_kernel.h"

#include <executorch/extension/aten_util/make_aten_functor_from_et_functor.h>
#include <executorch/extension/kernel_util/make_boxed_from_unboxed_functor.h>

#include <torch/library.h>

namespace torch {
    namespace executor {
        namespace native {
            Tensor& code2x8_lut_matmat_out_no_context(
                ...
                Tensor& output
            ) {
                void* memory_pool = malloc(10000000 * sizeof(uint8_t));
                MemoryAllocator allocator(10000000, (uint8_t*)memory_pool);

                exec_aten::RuntimeContext context{nullptr, &allocator};
                return torch::executor::native::code2x8_lut_matmat_out(
                    context,
                    ...,
                    output
                );
            }

            at::Tensor code2x8_lut_matmat(
                ...
            ) {
                auto sizes = input.sizes().vec();
                sizes[sizes.size() - 1] = codes.size(1) * codebooks.size(2);
                auto out = at::empty(sizes,
                    at::TensorOptions()
                    .dtype(input.dtype())
                    .device(input.device())
                );

                WRAP_TO_ATEN(code2x8_lut_matmat_out_no_context, 5)(
                    ...,
                    out
                );
                return out;
            }
        } // namespace native
    } // namespace executor
} // namespace torch

TORCH_LIBRARY(aqlm, m) {
  m.def(
      "code2x8_lut_matmat(Tensor input, Tensor codes, "
      "Tensor codebooks, Tensor scales, Tensor? bias=None) -> Tensor"
  );
  m.def(
      "code2x8_lut_matmat.out(Tensor input, Tensor codes, "
      "Tensor codebooks, Tensor scales, Tensor? bias=None, *, Tensor(c!) out) -> Tensor(c!)"
  );
}

TORCH_LIBRARY_IMPL(aqlm, CompositeExplicitAutograd, m) {
  m.impl(
      "code2x8_lut_matmat", torch::executor::native::code2x8_lut_matmat
  );
  m.impl(
      "code2x8_lut_matmat.out",
      WRAP_TO_ATEN(torch::executor::native::code2x8_lut_matmat_out_no_context, 5)
    );
}

, which closely follows the executorch custom sdpa code.

I build it as two standalone dynamic libs: one lut_kernel.cpp with dependency only on executorch and lut_kernel_pytorch.cpp with additional torch dependency. I load the latter lib into pytorch as torch.ops.load_library(f"../libaqlm_bindings.dylib").

The problem:

I wrote a small nn.Module that basically just calls the op. In pytorch it works well. aten_dialect for it looks like this:

ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, p_codes: "i8[3072, 128, 2]", p_codebooks: "f32[2, 256, 1, 8]", p_scales: "f32[3072, 1, 1, 1]", p_bias: "f32[3072]", input: "f32[s0, s1, 1024]"):
            input_1 = input
            
            # File: [/Users/blacksamorez/reps/AQLM/inference_lib/src/aqlm/inference.py:74](https://file+.vscode-resource.vscode-cdn.net/Users/blacksamorez/reps/AQLM/inference_lib/src/aqlm/inference.py:74) in forward, code: return torch.ops.aqlm.code2x8_lut_matmat(
            code2x8_lut_matmat: "f32[s0, s1, 1024]" = torch.ops.aqlm.code2x8_lut_matmat.default(input_1, p_codes, p_codebooks, p_scales, p_bias);  input_1 = p_codes = p_codebooks = p_scales = p_bias = None
            return (code2x8_lut_matmat,)
            
Graph signature: ExportGraphSignature(input_specs=[InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_codes'), target='codes', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_codebooks'), target='codebooks', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_scales'), target='scales', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_bias'), target='bias', persistent=None), InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='input'), target=None, persistent=None)], output_specs=[OutputSpec(kind=<OutputKind.USER_OUTPUT: 1>, arg=TensorArgument(name='code2x8_lut_matmat'), target=None)])
Range constraints: {s0: VR[1, 9223372036854775806], s1: VR[1, 9223372036854775806]}

But when calling to_edge I get an error saying that Operator torch._ops.aqlm.code2x8_lut_matmat.default is not Aten Canonical.

I don't conceptually understand how the EXECUTORCH_LIBRARY macro from lut_kernel.cpp supposed to make it Aten Canonical. Should I somehow recompile executorch to include my kernel?

Thank you!

The text was updated successfully, but these errors were encountered:

BlackSamorez · 2024-08-01T22:15:26Z

I added compile_config=EdgeCompileConfig(_check_ir_validity=False) to to_edge and it appears to be exporting now.
Linking libaqlm.dylib to executor_runner (and replacing executorch with executorch_no_prim_ops in it's libs) I'm able to compile it.
However, running it, I'm encountering an error that goes like this:

E 00:00:00.001621 executorch:method.cpp:536] Missing operator: [0] aqlm::code2x8_lut_matmat.out
E 00:00:00.001623 executorch:method.cpp:724] There are 1 instructions don't have corresponding operator registered. See logs for details

I'm on executorch v0.3.0.

digantdesai · 2024-08-21T14:34:08Z

@larryliu0820 any suggestions?

BlackSamorez · 2024-08-21T21:09:03Z

@digantdesai Hi! Thanks for the reply.
I think we shifted the discussion to #4719 .
In light of that, I'm closing this issue.

manuelcandales assigned larryliu0820 and lucylq Aug 14, 2024

BlackSamorez closed this as completed Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to link custom ops? #4510

How to link custom ops? #4510

BlackSamorez commented Aug 1, 2024

BlackSamorez commented Aug 1, 2024 •

edited

Loading

digantdesai commented Aug 21, 2024

BlackSamorez commented Aug 21, 2024

How to link custom ops? #4510

How to link custom ops? #4510

Comments

BlackSamorez commented Aug 1, 2024

Overview:

The problem:

BlackSamorez commented Aug 1, 2024 • edited Loading

digantdesai commented Aug 21, 2024

BlackSamorez commented Aug 21, 2024

BlackSamorez commented Aug 1, 2024 •

edited

Loading