-
Notifications
You must be signed in to change notification settings - Fork 712
Description
🚀 The feature, motivation and pitch
Problem statement
I am working on integrating CMSIS-NN, an external library of optimized kernels, into executorch in the Cortex-M backend. These kernels are implemented with integers for e.g. quant parameters and dimensions using int32_t.
However, if I define the operator registration with ints using the .yaml-api:
- func: cortex_m::quantized_conv2d.out(Tensor input,
Tensor weight,
Tensor? bias,
int[] stride,
int[] padding,
int[] dilation
int input_offset
int output_offset
int[] requantize_multiplier
int[] requantize_shifts
Scalar activation_min
Scalar activation_max, *, Tensor(a!) out) -> Tensor(a!)
The resulting call signature will end up with int64_t types. Looking into the program.fbs schema, it generally seems that integers in the exir graph are serialized to int64:
table Int {
int_val: long;
}
For single ints this is not a big issue since it is reasonably safe to cast to int32_t at runtime. For list of integers with unknown length on the other hand, this would require dynamic memory allocation or pre-allocating of a maximum length, both of which are non-optimal.
Proposed solution
My suggestion is to add support for explicitly using int32 in the exir-graph and in the kernel-registration.
It seems to me that this problem is general enough to motivate a common solution rather than having multiple backends implementing workarounds.
@lucylq @JacobSzwejbka @psiddh @SS-JIA
Alternatives
Using a Tensor for IntLists
Using a tensor seems to be a viable workaround for the IntLists, but it comes with some overhead compared to just having a list. If this will be used we must first make sure that this has a negligible performance impact.
Using Scalar
Using scalar should only be done when all dtypes are supported according to https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/native/README.md, not for supporting one single type. There are also no ScalarLists so it does not solve the main issue.
Use a modified schema rather than extending it.
I believe that introducing parallel ways of serializing the graph runs the risk of creating more complex issues where the runtime has to know which version of serialization was used.
Additional context
No response
RFC (Optional)
No response