-
Notifications
You must be signed in to change notification settings - Fork 74.7k
[tflite] use newer xnnpack related source for M1 bazel build #47639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tflite] use newer xnnpack related source for M1 bazel build #47639
Conversation
with newer related code, we can build benchmark_model with xnnpack, gpu, and coreml delegates. On M1 machiens, Either ``` bazel-3.7.2-arm64 tensorflow/lite/tools/benchmark:benchmark_model --config macos_arm64 --macos_cpus arm64 ``` or ``` bazel-4.0-arm64 tensorflow/lite/tools/benchmark:benchmark_model ``` works.
MobileNet V1 1.0 224 inference latencyunit: ms
c.f. #47605 XNNPACK numbers are not better than non-XNNPACK ones? @Maratyszcza |
@terryheo With this patch,
|
@freedomtan I suspect TFLite might be calling into Accelerate on Mac. Accelerate uses AMX accelerator, which is not documented, and thus not used in XNNPACK. |
@Maratyszcza NO, those numbers are not AMX/Accelerate numbers. Accelerate is not enabled (yet) when building bazel. Inception V3 numbers look more reasonable. I also updated the MobileNet V1 table with Accelerate number. Inception V3 float from tflite hosted model
CPU + Accelerate:
and |
Using cblas from Accelerate for convolution could be enabled on M1 machines with something like the following diff --git a/tensorflow/lite/kernels/internal/BUILD b/tensorflow/lite/kernels/internal/BUILD
index d1b0505de90..7bac11d8fb6 100644
--- a/tensorflow/lite/kernels/internal/BUILD
+++ b/tensorflow/lite/kernels/internal/BUILD
@@ -286,7 +286,10 @@ cc_library(
"optimized/sparse_ops/fully_connected.h",
],
compatible_with = get_compatible_with_portable(),
- copts = tflite_copts(),
+ copts = tflite_copts() + select({
+ "//tensorflow:macos_arm64": ["-DTF_LITE_USE_CBLAS"],
+ "//conditions:default": [],
+ }),
deps = [
":common",
":compatibility",
@@ -307,6 +310,13 @@ cc_library(
"@gemmlowp//:fixedpoint",
"@ruy//ruy/profiler:instrumentation",
],
+ linkopts = select({
+ "//tensorflow:macos_arm64": [
+ "-framework Accelerate",
+ ],
+ "//conditions:default": [],
+ }),
+
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
with newer related code, we can build benchmark_model with xnnpack, gpu, and coreml delegates.
On M1 machines,
Either
or
works.