Skip to content

How to link custom ops? #4510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BlackSamorez opened this issue Aug 1, 2024 · 3 comments
Closed

How to link custom ops? #4510

BlackSamorez opened this issue Aug 1, 2024 · 3 comments
Assignees

Comments

@BlackSamorez
Copy link

Hi!

I'm trying to integrate some of quantized MatMul C++ kernels into Executorch and I'm having a bad time: the documentation is very vague about what exactly I need to include/link for ATen to pick up my ops.

I would greatly appreciate any help in trying to make it work.

Overview:

Source code for the dynamic library containing the ops consists of 3 files: lut_kernel.h, lut_kernel.cpp, lut_kernel_pytorch.cpp. The files contain roughly this code:

// lut_kernel.h
#pragma once

#include <executorch/runtime/kernel/kernel_includes.h>

namespace torch {
namespace executor {

namespace native {

Tensor& code2x8_lut_matmat_out(
  RuntimeContext& ctx,
  const Tensor& input,
  const Tensor& codes,
  const Tensor& codebooks,
  const Tensor& scales,
  const optional<Tensor>& bias,
  Tensor& out
);
} // namespace native
} // namespace executor
} // namespace torch
// lut_kernel.cpp
#include "lut_kernel.h"

#include <executorch/extension/kernel_util/make_boxed_from_unboxed_functor.h>

namespace torch {
  namespace executor {
    namespace native {
      Tensor& code2x8_lut_matmat_out(
        RuntimeContext& ctx,
        const Tensor& input,
        const Tensor& codes,
        const Tensor& codebooks,
        const Tensor& scales,
        const optional<Tensor>& bias,
        Tensor& out
      ) {
        // CALCULATIONS
        return out;
      }
    } // namespace native
  } // namespace executor
} // namespace torch

EXECUTORCH_LIBRARY(aqlm, "code2x8_lut_matmat.out", torch::executor::native::code2x8_lut_matmat_out);
// lut_kernel_pytorch.cpp
#include "lut_kernel.h"

#include <executorch/extension/aten_util/make_aten_functor_from_et_functor.h>
#include <executorch/extension/kernel_util/make_boxed_from_unboxed_functor.h>

#include <torch/library.h>

namespace torch {
    namespace executor {
        namespace native {
            Tensor& code2x8_lut_matmat_out_no_context(
                ...
                Tensor& output
            ) {
                void* memory_pool = malloc(10000000 * sizeof(uint8_t));
                MemoryAllocator allocator(10000000, (uint8_t*)memory_pool);

                exec_aten::RuntimeContext context{nullptr, &allocator};
                return torch::executor::native::code2x8_lut_matmat_out(
                    context,
                    ...,
                    output
                );
            }

            at::Tensor code2x8_lut_matmat(
                ...
            ) {
                auto sizes = input.sizes().vec();
                sizes[sizes.size() - 1] = codes.size(1) * codebooks.size(2);
                auto out = at::empty(sizes,
                    at::TensorOptions()
                    .dtype(input.dtype())
                    .device(input.device())
                );

                WRAP_TO_ATEN(code2x8_lut_matmat_out_no_context, 5)(
                    ...,
                    out
                );
                return out;
            }
        } // namespace native
    } // namespace executor
} // namespace torch

TORCH_LIBRARY(aqlm, m) {
  m.def(
      "code2x8_lut_matmat(Tensor input, Tensor codes, "
      "Tensor codebooks, Tensor scales, Tensor? bias=None) -> Tensor"
  );
  m.def(
      "code2x8_lut_matmat.out(Tensor input, Tensor codes, "
      "Tensor codebooks, Tensor scales, Tensor? bias=None, *, Tensor(c!) out) -> Tensor(c!)"
  );
}

TORCH_LIBRARY_IMPL(aqlm, CompositeExplicitAutograd, m) {
  m.impl(
      "code2x8_lut_matmat", torch::executor::native::code2x8_lut_matmat
  );
  m.impl(
      "code2x8_lut_matmat.out",
      WRAP_TO_ATEN(torch::executor::native::code2x8_lut_matmat_out_no_context, 5)
    );
}

, which closely follows the executorch custom sdpa code.

I build it as two standalone dynamic libs: one lut_kernel.cpp with dependency only on executorch and lut_kernel_pytorch.cpp with additional torch dependency. I load the latter lib into pytorch as torch.ops.load_library(f"../libaqlm_bindings.dylib").

The problem:

I wrote a small nn.Module that basically just calls the op. In pytorch it works well. aten_dialect for it looks like this:

ExportedProgram:
    class GraphModule(torch.nn.Module):
        def forward(self, p_codes: "i8[3072, 128, 2]", p_codebooks: "f32[2, 256, 1, 8]", p_scales: "f32[3072, 1, 1, 1]", p_bias: "f32[3072]", input: "f32[s0, s1, 1024]"):
            input_1 = input
            
            # File: [/Users/blacksamorez/reps/AQLM/inference_lib/src/aqlm/inference.py:74](https://file+.vscode-resource.vscode-cdn.net/Users/blacksamorez/reps/AQLM/inference_lib/src/aqlm/inference.py:74) in forward, code: return torch.ops.aqlm.code2x8_lut_matmat(
            code2x8_lut_matmat: "f32[s0, s1, 1024]" = torch.ops.aqlm.code2x8_lut_matmat.default(input_1, p_codes, p_codebooks, p_scales, p_bias);  input_1 = p_codes = p_codebooks = p_scales = p_bias = None
            return (code2x8_lut_matmat,)
            
Graph signature: ExportGraphSignature(input_specs=[InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_codes'), target='codes', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_codebooks'), target='codebooks', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_scales'), target='scales', persistent=None), InputSpec(kind=<InputKind.PARAMETER: 2>, arg=TensorArgument(name='p_bias'), target='bias', persistent=None), InputSpec(kind=<InputKind.USER_INPUT: 1>, arg=TensorArgument(name='input'), target=None, persistent=None)], output_specs=[OutputSpec(kind=<OutputKind.USER_OUTPUT: 1>, arg=TensorArgument(name='code2x8_lut_matmat'), target=None)])
Range constraints: {s0: VR[1, 9223372036854775806], s1: VR[1, 9223372036854775806]}

But when calling to_edge I get an error saying that Operator torch._ops.aqlm.code2x8_lut_matmat.default is not Aten Canonical.

I don't conceptually understand how the EXECUTORCH_LIBRARY macro from lut_kernel.cpp supposed to make it Aten Canonical. Should I somehow recompile executorch to include my kernel?

Thank you!

@BlackSamorez
Copy link
Author

BlackSamorez commented Aug 1, 2024

I added compile_config=EdgeCompileConfig(_check_ir_validity=False) to to_edge and it appears to be exporting now.
Linking libaqlm.dylib to executor_runner (and replacing executorch with executorch_no_prim_ops in it's libs) I'm able to compile it.
However, running it, I'm encountering an error that goes like this:

E 00:00:00.001621 executorch:method.cpp:536] Missing operator: [0] aqlm::code2x8_lut_matmat.out
E 00:00:00.001623 executorch:method.cpp:724] There are 1 instructions don't have corresponding operator registered. See logs for details

I'm on executorch v0.3.0.

@digantdesai
Copy link
Contributor

@larryliu0820 any suggestions?

@BlackSamorez
Copy link
Author

@digantdesai Hi! Thanks for the reply.
I think we shifted the discussion to #4719 .
In light of that, I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants