Skip to content

Benchmarking ML models on Apple’s Metal Performance Shaders (MPS) backend for PyTorch — measuring training & inference performance across architectures.

License

Notifications You must be signed in to change notification settings

apoorvanand/torch-mps-bench

Repository files navigation

🧪 Torch-MPS-Bench

Torch-MPS-Bench is a lightweight benchmarking suite for running deep learning model performance tests on Apple Silicon GPUs via Metal Performance Shaders (MPS). It compares CPU vs MPS, FP32 vs FP16 precision, across multiple batch sizes — and produces CSV logs, plots, and Markdown reports.


🚀 Features

  • 🔹 Benchmark popular models (ResNet, DistilBERT, etc.) on CPU vs MPS
  • 🔹 Supports FP32 and FP16 precision
  • 🔹 Logs results to CSV with latency (P50/P90/P99) and throughput
  • 🔹 Generates plots for latency/throughput
  • 🔹 Auto-generates a Markdown report with best configs + CPU→MPS speedups
  • 🔹 Extensible — add your own models easily

📂 Repo Structure

.
├── bench.py            # Run benchmarks (single model/config)
├── plot_results.py     # Generate latency/throughput plots
├── gen_report.py       # Create Markdown summary report
├── requirements.txt    # Python deps (pandas, torch, transformers, tabulate, matplotlib)
├── results/
│   ├── bench.csv       # Collected benchmark results
│   ├── plots/          # Auto-generated plots
│   └── summary.md      # Auto-generated Markdown report
└── README.md

⚙️ Setup

# Create env (Python 3.10+ recommended)
python -m venv .venv
source .venv/bin/activate

# Install deps
pip install -r requirements.txt

▶️ Run Benchmarks

Run CPU vs MPS for ResNet50:

python bench.py --model resnet50 --device cpu --precision fp32 --batch 4 --out_csv results/bench.csv
python bench.py --model resnet50 --device mps --precision fp16 --batch 4 --out_csv results/bench.csv

Run DistilBERT (seq length 128):

python bench.py --model distilbert --device cpu --precision fp32 --batch 2 --seq_len 128 --out_csv results/bench.csv
python bench.py --model distilbert --device mps --precision fp16 --batch 2 --seq_len 128 --out_csv results/bench.csv

📊 Generate Plots

python plot_results.py --csv results/bench.csv --out results/plots

This produces:

  • results/plots/resnet50_latency_p50.png
  • results/plots/resnet50_throughput.png
  • etc.

📝 Generate Markdown Report

python gen_report.py --csv results/bench.csv --out results/summary.md --plots_dir results/plots

This creates results/summary.md with:

  • 🔹 Environment info (PyTorch, Python, OS)
  • 🔹 CPU→MPS speedup table
  • 🔹 Per-model best configs (latency/throughput)
  • 🔹 Compact results table
  • 🔹 Auto-embedded plots

📑 Example Report Snippet

# 🧪 Torch-MPS-Bench — Summary

_Generated: 2025-08-18 19:00:00_

- **PyTorch**: 2.5.0
- **Python**: 3.10.14
- **System**: Apple M2 Pro

---

## CPU → MPS Latency Speedup (P50)
| model     | batch | cpu_p50_ms | mps_p50_ms | speedup_x | pair                  |
|-----------|-------|------------|------------|-----------|-----------------------|
| resnet50  | 4     | 25.20      | 6.30       | 4.00      | cpu-fp32 vs mps-fp16  |
| distilbert| 2     | 112.00     | 40.00      | 2.80      | cpu-fp32 vs mps-fp16  |

---

## Model: resnet50
**Best Latency (P50)**
- Device: `mps` Precision: `fp16` Batch: `4` P50: **6.30 ms**

**Best Throughput**
- Device: `mps` Precision: `fp16` Batch: `8` Throughput: **120.5 samples/s**

**All Runs**
| device | precision | batch | p50_ms | p90_ms | p99_ms | throughput_sps |
|--------|-----------|-------|--------|--------|--------|----------------|
| cpu    | fp32      | 4     | 25.2   | 26.1   | 27.9   | 39.7           |
| mps    | fp16      | 4     | 6.3    | 6.5    | 6.8    | 126.4          |

![resnet50 Latency (P50)](results/plots/resnet50_latency_p50.png)
![resnet50 Throughput](results/plots/resnet50_throughput.png)

🔧 Extending

  • Add more models → extend bench.py with HuggingFace or TorchVision APIs
  • Add more devices (CUDA, ROCm) → plug into same CSV schema
  • Add CI → run sanity benchmarks on CPU & upload report

📜 License

MIT — feel free to fork, extend, and contribute 🚀


About

Benchmarking ML models on Apple’s Metal Performance Shaders (MPS) backend for PyTorch — measuring training & inference performance across architectures.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages