diff --git a/README.md b/README.md index eb121a0c..10e7db51 100644 --- a/README.md +++ b/README.md @@ -45,32 +45,32 @@ advantage of multi-core modern desktop and mobile CPUs. The table below presents **single-threaded** performance of Larq Compute Engine on different versions of a novel BNN model called QuickNet (trained on ImageNet dataset, released on [Larq Zoo](https://docs.larq.dev/zoo/)) -on a [Pixel 1 phone (2016)](https://support.google.com/pixelphone/answer/7158570?hl=en-GB) -and a Raspberry Pi 4 Model B ([BCM2711](https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2711/README.md)) board: +on a Raspberry Pi 4 Model B at 1.5GHz ([BCM2711](https://www.raspberrypi.com/documentation/computers/processors.html#bcm2711)) board, a [Pixel 1 Android phone (2016)](https://support.google.com/pixelphone/answer/7158570?hl=en-GB), and a [Mac Mini with M1 ARM CPU](https://www.apple.com/uk/mac-mini/): -| Model | Top-1 Accuracy | RPi 4 B, ms (1 thread) | Pixel 1, ms (1 thread) | -| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :--------------------: | :--------------------: | -| [QuickNet](https://docs.larq.dev/zoo/api/sota/#quicknet) ([.h5](https://github.com/larq/zoo/releases/download/quicknet-v0.2.0/quicknet_weights.h5)) | 58.6 % | 31.4 | 16.8 | -| [QuickNet-Large](https://docs.larq.dev/zoo/api/sota/#quicknetlarge) ([.h5](https://github.com/larq/zoo/releases/download/quicknet_large-v0.2.0/quicknet_large_weights.h5)) | 62.7 % | 48.7 | 25.5 | -| [QuickNet-XL](https://docs.larq.dev/zoo/api/sota/#quicknetxl) ([.h5](https://github.com/larq/zoo/releases/download/quicknet_xl-v0.1.0/quicknet_xl_weights.h5)) | 67.0 % | 82.9 | 44.2 | +| Model | Top-1 Accuracy | RPi 4B 1.5GHz, 1 thread (ms) | Pixel 1, 1 thread (ms) | Mac Mini M1, 1 thread (ms) | +|--------------------------------------------------------------------|----------------|------------------------------|------------------------|----------------------------| +| [QuickNetSmall](https://docs.larq.dev/zoo/api/sota/#quicknetsmall) | 59.4% | 27.7 | 16.8 | 4.0 | +| [QuickNet](https://docs.larq.dev/zoo/api/sota/#quicknet) | 63.3% | 45.0 | 25.5 | 5.8 | +| [QuickNetLarge](https://docs.larq.dev/zoo/api/sota/#quicknetlarge) | 66.9% | 77.0 | 44.2 | 9.9 | For reference, [dabnn](https://github.com/JDAI-CV/dabnn) (the other main BNN library) reports an inference time of 61.3 ms for [Bi-RealNet](https://docs.larq.dev/zoo/api/literature/#birealnet) (56.4% accuracy) on the Pixel 1 phone, while LCE achieves an inference time of 41.6 ms for Bi-RealNet on the same device. They furthermore present a modified version, BiRealNet-Stem, which achieves the same accuracy of 56.4% in 43.2 ms. The following table presents **multi-threaded** performance of Larq Compute Engine on -a Pixel 1 phone and a Raspberry Pi 4 Model B ([BCM2711](https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2711/README.md)) +a Pixel 1 phone and a Raspberry Pi 4 Model B at 1.5GHz ([BCM2711](https://www.raspberrypi.com/documentation/computers/processors.html#bcm2711)) board: -| Model | Top-1 Accuracy | RPi 4 B, ms (4 threads) | Pixel 1, ms (4 threads) | -| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------: | :---------------------: | :---------------------: | -| [QuickNet](https://docs.larq.dev/zoo/api/sota/#quicknet) ([.h5](https://github.com/larq/zoo/releases/download/quicknet-v0.2.0/quicknet_weights.h5)) | 58.6 % | 16.1 | 8.9 | -| [QuickNet-Large](https://docs.larq.dev/zoo/api/sota/#quicknetlarge) ([.h5](https://github.com/larq/zoo/releases/download/quicknet_large-v0.2.0/quicknet_large_weights.h5)) | 62.7 % | 24.7 | 12.6 | -| [QuickNet-XL](https://docs.larq.dev/zoo/api/sota/#quicknetxl) ([.h5](https://github.com/larq/zoo/releases/download/quicknet_xl-v0.1.0/quicknet_xl_weights.h5)) | 67.0 % | 37.9 | 22.8 | +| Model | Top-1 Accuracy | RPi 4B 1.5GHz, 4 threads (ms) | Pixel 1, 4 threads (ms) | Mac Mini M1, 4 threads (ms) | +|--------------------------------------------------------------------|----------------|-------------------------------|-------------------------|-----------------------------| +| [QuickNetSmall](https://docs.larq.dev/zoo/api/sota/#quicknetsmall) | 59.4% | 12.1 | 8.9 | 1.8 | +| [QuickNet](https://docs.larq.dev/zoo/api/sota/#quicknet) | 63.3% | 20.8 | 12.6 | 2.5 | +| [QuickNetLarge](https://docs.larq.dev/zoo/api/sota/#quicknetlarge) | 66.9% | 31.7 | 22.8 | 3.9 | -Benchmarked on August 21st, 2020 with LCE custom +Benchmarked on 2021-06-11 (Pixel 1), 2021-06-13 (Mac Mini M1), and 2022-04-20 (RPi 4B) with LCE custom [TFLite Model Benchmark Tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark) (see [here](https://github.com/larq/compute-engine/tree/main/larq_compute_engine/tflite/benchmark)) +with XNNPack enabled and BNN models with randomized inputs. ## Getting started