Skip to content

[tflite] use newer xnnpack related source for M1 bazel build #47639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

freedomtan
Copy link
Contributor

@freedomtan freedomtan commented Mar 8, 2021

with newer related code, we can build benchmark_model with xnnpack, gpu, and coreml delegates.

On M1 machines,

Either

bazel-3.7.2-arm64 tensorflow/lite/tools/benchmark:benchmark_model --config macos_arm64  --macos_cpus arm64

or

bazel-4.0-arm64 tensorflow/lite/tools/benchmark:benchmark_model --macos_cpus arm64

works.

with newer related code, we can build benchmark_model with xnnpack,
gpu, and coreml delegates.

On M1 machiens,

Either
```
bazel-3.7.2-arm64 tensorflow/lite/tools/benchmark:benchmark_model --config macos_arm64  --macos_cpus arm64
```
or
```
bazel-4.0-arm64 tensorflow/lite/tools/benchmark:benchmark_model
```
works.
@google-ml-butler google-ml-butler bot added the size:M CL Change Size: Medium label Mar 8, 2021
@google-cla google-cla bot added the cla: yes label Mar 8, 2021
@freedomtan
Copy link
Contributor Author

freedomtan commented Mar 8, 2021

MobileNet V1 1.0 224 inference latency

unit: ms

how average latency
CPU 1xthread 18.847
CPU XNNPACK 1xthread 29.219
CPU 4xthreads 9.087
CPU XNNPACK 4xthreads 9.079
CPU + Accelerate 5.397
GPU delegate 2.699
CoreML delegate 1.051

c.f. #47605

XNNPACK numbers are not better than non-XNNPACK ones? @Maratyszcza

@freedomtan
Copy link
Contributor Author

@terryheo With this patch,
I can build arm64 binary on a mac x86_64 machine.

bazel-3.7.2-x86_64 tensorflow/lite/tools/benchmark:benchmark_model --config macos_arm64  --macos_cpus arm64

@abattery abattery requested review from yyoon, teijeong and terryheo March 8, 2021 10:59
@abattery
Copy link
Contributor

abattery commented Mar 8, 2021

@terryheo @yyoon @teijeong could you review this PR?

@gbaned gbaned self-assigned this Mar 8, 2021
@Maratyszcza
Copy link
Contributor

@freedomtan I suspect TFLite might be calling into Accelerate on Mac. Accelerate uses AMX accelerator, which is not documented, and thus not used in XNNPACK.

@freedomtan
Copy link
Contributor Author

freedomtan commented Mar 9, 2021

@Maratyszcza NO, those numbers are not AMX/Accelerate numbers. Accelerate is not enabled (yet) when building bazel.

Inception V3 numbers look more reasonable. I also updated the MobileNet V1 table with Accelerate number.

Inception V3 float from tflite hosted model

how average latency
CPU 1xthread 183.136
CPU XNNPACK 1xthread 159.613
CPU 2xthreads 103.406
CPU XNNPACK 2xthreads 85.646
CPU 4xthreads 64.729
CPU XNNPACK 4xthreads 46.066
CPU + Accelerate 31.743
GPU delegate 13.449
CoreML delegate 2.834

CPU + Accelerate:

bazel build //tensorflow/lite/tools/benchmark:benchmark_model --macos_cpus=arm64 --copt=-DTF_LITE_USE_CBLAS

and -framework Accelerate is added to benchmark_model's BUILD file.

@freedomtan
Copy link
Contributor Author

freedomtan commented Mar 9, 2021

Using cblas from Accelerate for convolution could be enabled on M1 machines with something like the following

diff --git a/tensorflow/lite/kernels/internal/BUILD b/tensorflow/lite/kernels/internal/BUILD
index d1b0505de90..7bac11d8fb6 100644
--- a/tensorflow/lite/kernels/internal/BUILD
+++ b/tensorflow/lite/kernels/internal/BUILD
@@ -286,7 +286,10 @@ cc_library(
         "optimized/sparse_ops/fully_connected.h",
     ],
     compatible_with = get_compatible_with_portable(),
-    copts = tflite_copts(),
+    copts = tflite_copts() + select({
+        "//tensorflow:macos_arm64": ["-DTF_LITE_USE_CBLAS"],
+        "//conditions:default": [],
+    }),
     deps = [
         ":common",
         ":compatibility",
@@ -307,6 +310,13 @@ cc_library(
         "@gemmlowp//:fixedpoint",
         "@ruy//ruy/profiler:instrumentation",
     ],
+    linkopts = select({
+        "//tensorflow:macos_arm64": [
+            "-framework Accelerate",
+        ],
+        "//conditions:default": [],
+    }),
+
 )

@abattery abattery added comp:lite TF Lite related issues subtype:macOS macOS Build/Installation issues type:build/install Build and install issues labels Mar 10, 2021
Copy link
Member

@terryheo terryheo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 19, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 19, 2021
@copybara-service copybara-service bot merged commit e8d2c85 into tensorflow:master Mar 23, 2021
@freedomtan freedomtan deleted the bazel_build_benchmark_model_xnnpack branch March 23, 2021 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:lite TF Lite related issues ready to pull PR ready for merge process size:M CL Change Size: Medium subtype:macOS macOS Build/Installation issues type:build/install Build and install issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants