On-device face recognition pipeline benchmark across CoreML, ONNX Runtime, and TFLite — with 5 levels of optimization from baseline to ANE-accelerated inference.
53 pipelines. 3 runtimes. 4 detectors. 3 recognizers. One app.
- Multi-Runtime Comparison — Same models deployed to CoreML, ONNX Runtime, and TFLite for controlled benchmarking
- Swappable Pipeline Architecture — Swift Protocol-based design lets you swap any detector/recognizer combination at runtime
- 5 Optimization Levels — Baseline (v1) → Pipeline optimization (v2) → INT8 quantization (v3) → Hardware acceleration (v4) → Model variants (v5)
- Gallery Scan — Scan your entire photo library to find matching faces, with per-photo throughput measurement
- Op-Level Profiling — Per-operator timing for TFLite (C Telemetry API) and ONNX Runtime (C++ bridge), cross-pipeline comparison
- CoreML Compute Plan — Visualize per-layer hardware assignment (CPU/GPU/ANE) using MLComputePlan API
- Full Python Toolchain — Model conversion, INT8/FP16 quantization, WIDER FACE / LFW accuracy evaluation, ONNX graph optimization
Full pipeline: Detection + Alignment + Recognition. 100 iterations, 5 warmup.
| Pipeline | Runtime | Det (ms) | Rec (ms) | Total (ms) |
|---|---|---|---|---|
| v3-coreml-scrfd500m | CoreML ANE | 4.8 | 0.8 | 6.1 |
| v1-coreml-scrfd500m | CoreML ANE | 4.7 | 0.9 | 6.1 |
| v4-coreml-scrfd500m-ane | CoreML ANE | 4.9 | 0.9 | 6.2 |
| v3-coreml-yunet | CoreML ANE | 4.1 | 1.6 | 6.4 |
| v1-coreml-yunet | CoreML ANE | 4.1 | 1.9 | 7.1 |
| v4-coreml-yunet-ane | CoreML ANE | 4.2 | 1.9 | 7.2 |
| v3-coreml-scrfd10g | CoreML ANE | 8.7 | 0.9 | 10.1 |
| v1-coreml-scrfd10g | CoreML ANE | 9.4 | 1.0 | 11.0 |
| v3.3-tflite-yunet | TFLite INT8 | 8.4 | 4.3 | 13.6 |
| v4-coreml-yunet-cpu | CoreML CPU | 7.4 | 5.1 | 13.7 |
| v1-tflite-yunet (t=2) | TFLite CPU | 10.9 | 14.1 | 26.1 |
| v1-ort-yunet (t=2) | ORT CPU | 14.4 | 22.3 | 37.8 |
CoreML with FP16 and ANE is 6x faster than TFLite CPU and 6.2x faster than ORT CPU on the same models.
| Model | Dataset | Metric |
|---|---|---|
| SCRFD 500M | WIDER FACE val | AP 88.9 / 83.8 / 51.4 (Easy/Med/Hard) |
| SCRFD 10G | WIDER FACE val | AP 92.5 / 93.0 / 67.8 |
| YuNet | WIDER FACE val | AP 88.6 / 81.6 / 48.1 |
| MobileFaceNet | LFW 10-fold CV | 99.57% verification accuracy |
| EdgeFace-XS | LFW 10-fold CV | 99.72% |
| EdgeFace-S | LFW 10-fold CV | 99.77% |
INT8 quantization accuracy drop: < 0.5% across all models.
- macOS with Xcode 15+
- uv (Python package manager)
- iPhone with iOS 17+ (for on-device deployment)
- Apple Developer account (free or paid)
git clone https://github.com/YOUR_USERNAME/ios-face-recognition-suite.git
cd ios-face-recognition-suite
# Check prerequisites and set up Python environment
./scripts/bootstrap.sh
# Set up code signing
cp App/Local.xcconfig.example App/Local.xcconfig
# Edit App/Local.xcconfig — set DEVELOPMENT_TEAM to your Apple Team IDFinding your Team ID: Xcode → Settings → Accounts → select your team → Team ID
- Open
App/FaceRecognitionApp.xcodeprojin Xcode - Select your iPhone as the build target
- Build and Run (Cmd+R)
The app ships with pre-converted CoreML/TFLite/ONNX models bundled in the Swift packages — no model conversion needed for the first run.
If you want to rebuild models from source ONNX:
# Download source models
cd Converter && uv sync && uv run python convert.py --download
# Convert to all runtimes
uv run python convert.py --all
# (Optional) Quantize to INT8
cd ../Quantizer && uv sync && uv run python quantize.py --all
# (Optional) Evaluate accuracy
cd ../Evaluator && uv sync && uv run python evaluate.py --allOr use the end-to-end build pipeline:
# Build specific version/detector
./scripts/run_e2e_build.sh -v 1 -d yunet
# List all available pipelines
./scripts/run_e2e_build.sh --listcd App
cp fastlane/.env.example fastlane/.env
# Edit fastlane/.env with your App Store Connect API key
fastlane betaApp/ iOS app (SwiftUI, CoreData, benchmark UI)
Packages/
FRPipelineCore/ Protocol definitions, alignment, similarity search
FRPipelineVision/ v0: Vision framework detection + CoreML recognition
FRPipelineCoreML/ v1/v3/v4/v5: CoreML detection + recognition
FRTFLiteRuntime/ TFLite C API Swift wrapper
FRPipelineTFLite/ v1/v2/v3: TFLite detection + recognition
FRONNXRuntime/ ORT C++ API Swift wrapper
FRPipelineORT/ v1/v2/v3: ONNX Runtime detection + recognition
Converter/ ONNX → CoreML/TFLite/ORT conversion
Quantizer/ INT8/FP16 post-training quantization
Evaluator/ WIDER FACE AP + LFW accuracy evaluation
Optimizer/ ONNX graph optimization (PReLU decompose, Gemm→Conv)
scripts/ E2E build pipeline orchestration
| Version | What it does |
|---|---|
| v0 | Baseline: Vision framework (detection) + CoreML (recognition) |
| v1 | Full custom: CoreML / TFLite / ORT detection + recognition |
| v2 | Pipeline optimization: buffer reuse, vDSP preprocessing |
| v3 | INT8 quantization with PadV2 optimization |
| v4 | CoreML ComputeUnits: CPU vs GPU vs ANE comparison |
| v5 | Model variants: YOLOv8n-Face, SCRFD 2.5G, EdgeFace-XS/S |
| Stage | Model | Size (FP32) | Source |
|---|---|---|---|
| Detection | SCRFD 500M | 2.5 MB | InsightFace |
| Detection | SCRFD 10G | 16.9 MB | InsightFace |
| Detection | YuNet | 227 KB | OpenCV Zoo |
| Detection | YOLOv8n-Face | 6.2 MB | lindevs |
| Recognition | MobileFaceNet | 13.3 MB | InsightFace |
| Recognition | EdgeFace-XS | 1.8 MB | EdgeFace |
| Recognition | EdgeFace-S | 3.6 MB | EdgeFace |
MIT