|
1 | 1 | --- |
2 | 2 | name: building |
3 | | -description: Build ExecuTorch runners or C++ libraries. Use when compiling runners for Llama, Whisper, or other models, or building the C++ runtime. |
| 3 | +description: Build ExecuTorch from source — Python package, C++ runtime, runners, cross-compilation, and backend-specific builds. Use when compiling anything in the ExecuTorch repo, diagnosing build failures, or setting up platform-specific builds. |
4 | 4 | --- |
5 | 5 |
|
6 | | -# Building |
| 6 | +# Building ExecuTorch |
7 | 7 |
|
8 | | -## Runners (Makefile) |
| 8 | +## Step 1: Ensure Python environment (detect and fix automatically) |
| 9 | + |
| 10 | +**Path A — conda (preferred):** |
| 11 | +```bash |
| 12 | +# Initialize conda for non-interactive shells (required in Claude Code / CI) |
| 13 | +eval "$(conda shell.bash hook 2>/dev/null)" |
| 14 | + |
| 15 | +# Check if executorch conda env exists; create if not |
| 16 | +conda env list 2>/dev/null | grep executorch || \ |
| 17 | + ls "$(conda info --base 2>/dev/null)/envs/" 2>/dev/null | grep executorch || \ |
| 18 | + conda create -yn executorch python=3.12 |
| 19 | + |
| 20 | +# Activate |
| 21 | +conda activate executorch |
| 22 | +``` |
| 23 | + |
| 24 | +**Path B — no conda (fall back to venv):** |
| 25 | +```bash |
| 26 | +# Find a compatible Python (3.10–3.13). On macOS with only Homebrew Python 3.14+, |
| 27 | +# install a compatible version first: brew install python@3.12 |
| 28 | +python3.12 -m venv .executorch-venv # or python3.11, python3.10, python3.13 |
| 29 | +source .executorch-venv/bin/activate |
| 30 | +pip install --upgrade pip |
| 31 | +``` |
| 32 | + |
| 33 | +**Then verify (either path):** |
| 34 | + |
| 35 | +Run `python --version` and `cmake --version`. Fix automatically: |
| 36 | +- **Python not 3.10–3.13**: recreate the env with a correct Python version. |
| 37 | +- **cmake missing or < 3.24**: run `pip install 'cmake>=3.24'` inside the env. |
| 38 | +- **cmake >= 4.0**: works in practice, no action needed. |
| 39 | + |
| 40 | +Parallel jobs: `$(sysctl -n hw.ncpu)` on macOS, `$(nproc)` on Linux. |
| 41 | + |
| 42 | +## Step 2: Build |
| 43 | + |
| 44 | +Route based on what the user asks for: |
| 45 | +- User mentions **Android** → skip to [Cross-compilation: Android](#cross-compilation) |
| 46 | +- User mentions **iOS** or **frameworks** → skip to [Cross-compilation: iOS](#cross-compilation) |
| 47 | +- User mentions a **model name** (llama, whisper, etc.) → skip to [LLM / ASR model runner](#llm--asr-model-runner-simplest-path-for-running-models) |
| 48 | +- User mentions **C++ runtime** or **cmake** → skip to [C++ runtime](#c-runtime-standalone) |
| 49 | +- Otherwise → default to **Python package** below |
| 50 | + |
| 51 | +### Python package (default) |
9 | 52 | ```bash |
10 | | -make help # list all targets |
11 | | -make llama-cpu # Llama |
12 | | -make whisper-metal # Whisper on Metal |
13 | | -make gemma3-cuda # Gemma3 on CUDA |
| 53 | +conda activate executorch |
| 54 | +./install_executorch.sh --editable # editable install from source |
14 | 55 | ``` |
| 56 | +This handles everything: submodules, deps, C++ build, Python install. Takes ~10 min on Apple Silicon. |
| 57 | + |
| 58 | +For subsequent rebuilds (deps already present): `pip install -e . --no-build-isolation` |
| 59 | + |
| 60 | +For minimal install (skip example deps): `./install_executorch.sh --minimal` |
| 61 | + |
| 62 | +Enable additional backends: |
| 63 | +```bash |
| 64 | +CMAKE_ARGS="-DEXECUTORCH_BUILD_COREML=ON -DEXECUTORCH_BUILD_MPS=ON" ./install_executorch.sh --editable |
| 65 | +``` |
| 66 | + |
| 67 | +Verify: `python -c "from executorch.exir import to_edge_transform_and_lower; print('OK')"` |
| 68 | + |
| 69 | +### LLM / ASR model runner (simplest path for running models) |
| 70 | + |
| 71 | +```bash |
| 72 | +conda activate executorch |
| 73 | +make <model>-<backend> |
| 74 | +``` |
| 75 | + |
| 76 | +Available targets (run `make help` for full list): |
| 77 | + |
| 78 | +| Target | Backend | macOS | Linux | |
| 79 | +|--------|---------|-------|-------| |
| 80 | +| `llama-cpu` | CPU | yes | yes | |
| 81 | +| `llama-cuda` | CUDA | — | yes | |
| 82 | +| `llama-cuda-debug` | CUDA (debug) | — | yes | |
| 83 | +| `llava-cpu` | CPU | yes | yes | |
| 84 | +| `whisper-cpu` | CPU | yes | yes | |
| 85 | +| `whisper-metal` | Metal | yes | — | |
| 86 | +| `whisper-cuda` | CUDA | — | yes | |
| 87 | +| `parakeet-cpu` | CPU | yes | yes | |
| 88 | +| `parakeet-metal` | Metal | yes | — | |
| 89 | +| `parakeet-cuda` | CUDA | — | yes | |
| 90 | +| `voxtral-cpu` | CPU | yes | yes | |
| 91 | +| `voxtral-cuda` | CUDA | — | yes | |
| 92 | +| `voxtral-metal` | Metal | yes | — | |
| 93 | +| `voxtral_realtime-cpu` | CPU | yes | yes | |
| 94 | +| `voxtral_realtime-cuda` | CUDA | — | yes | |
| 95 | +| `voxtral_realtime-metal` | Metal | yes | — | |
| 96 | +| `gemma3-cpu` | CPU | yes | yes | |
| 97 | +| `gemma3-cuda` | CUDA | — | yes | |
| 98 | +| `sortformer-cpu` | CPU | yes | yes | |
| 99 | +| `sortformer-cuda` | CUDA | — | yes | |
| 100 | +| `silero-vad-cpu` | CPU | yes | yes | |
| 101 | +| `clean` | — | yes | yes | |
15 | 102 |
|
16 | 103 | Output: `cmake-out/examples/models/<model>/<runner>` |
17 | 104 |
|
18 | | -## C++ Libraries (CMake) |
| 105 | +### C++ runtime (standalone) |
| 106 | + |
| 107 | +**With presets (recommended):** |
| 108 | + |
| 109 | +| Platform | Command | |
| 110 | +|----------|---------| |
| 111 | +| macOS | `cmake -B cmake-out --preset macos` (uses Xcode generator — requires Xcode) | |
| 112 | +| Linux | `cmake -B cmake-out --preset linux -DCMAKE_BUILD_TYPE=Release` | |
| 113 | +| Windows | `cmake -B cmake-out --preset windows -T ClangCL` | |
| 114 | + |
| 115 | +Then: `cmake --build cmake-out --config Release -j$(sysctl -n hw.ncpu)` (macOS) or `cmake --build cmake-out -j$(nproc)` (Linux) |
| 116 | + |
| 117 | +**LLM libraries via workflow presets** (configure + build + install in one command): |
| 118 | +```bash |
| 119 | +cmake --workflow --preset llm-release # CPU |
| 120 | +cmake --workflow --preset llm-release-metal # Metal (macOS) |
| 121 | +cmake --workflow --preset llm-release-cuda # CUDA (Linux/Windows) |
| 122 | +``` |
| 123 | + |
| 124 | +**Manual CMake (custom flags):** |
| 125 | +```bash |
| 126 | +cmake -B cmake-out \ |
| 127 | + -DCMAKE_BUILD_TYPE=Release \ |
| 128 | + -DEXECUTORCH_BUILD_XNNPACK=ON \ |
| 129 | + -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ |
| 130 | + -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ |
| 131 | + -DEXECUTORCH_BUILD_EXTENSION_FLAT_TENSOR=ON \ |
| 132 | + -DEXECUTORCH_BUILD_EXTENSION_NAMED_DATA_MAP=ON \ |
| 133 | + -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \ |
| 134 | + -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON |
| 135 | +cmake --build cmake-out --parallel "$(nproc 2>/dev/null || sysctl -n hw.ncpu)" |
| 136 | +``` |
| 137 | + |
| 138 | +Run `cmake --list-presets` to see all available presets. |
| 139 | + |
| 140 | +### Cross-compilation |
| 141 | + |
| 142 | +**iOS/macOS frameworks:** |
| 143 | +```bash |
| 144 | +./scripts/build_apple_frameworks.sh --coreml --mps --xnnpack |
| 145 | +``` |
| 146 | +Link in Xcode with `-all_load` linker flag. |
| 147 | + |
| 148 | +**Android:** |
| 149 | + |
| 150 | +Requires `ANDROID_NDK` on PATH (typically set by Android Studio or standalone NDK install). |
19 | 151 | ```bash |
20 | | -cmake --list-presets # list presets |
21 | | -cmake --workflow --preset llm-release # LLM CPU |
22 | | -cmake --workflow --preset llm-release-metal # LLM Metal |
| 152 | +# Verify NDK is available |
| 153 | +echo $ANDROID_NDK # must point to NDK root, e.g. ~/Library/Android/sdk/ndk/<version> |
| 154 | +export ANDROID_ABIS=arm64-v8a BUILD_AAR_DIR=aar-out |
| 155 | +mkdir -p $BUILD_AAR_DIR && sh scripts/build_android_library.sh |
23 | 156 | ``` |
| 157 | + |
| 158 | +## Key build options |
| 159 | + |
| 160 | +Most commonly needed flags (full list: `CMakeLists.txt`): |
| 161 | + |
| 162 | +| Flag | What it enables | |
| 163 | +|------|-----------------| |
| 164 | +| `EXECUTORCH_BUILD_XNNPACK` | XNNPACK CPU backend | |
| 165 | +| `EXECUTORCH_BUILD_COREML` | Core ML (macOS/iOS) | |
| 166 | +| `EXECUTORCH_BUILD_MPS` | MPS GPU (macOS/iOS) | |
| 167 | +| `EXECUTORCH_BUILD_METAL` | Metal compute (macOS, requires EXTENSION_TENSOR) | |
| 168 | +| `EXECUTORCH_BUILD_CUDA` | CUDA GPU (Linux/Windows, requires EXTENSION_TENSOR) | |
| 169 | +| `EXECUTORCH_BUILD_KERNELS_OPTIMIZED` | Optimized kernels | |
| 170 | +| `EXECUTORCH_BUILD_KERNELS_QUANTIZED` | Quantized kernels | |
| 171 | +| `EXECUTORCH_BUILD_EXTENSION_MODULE` | Module extension (requires DATA_LOADER + FLAT_TENSOR + NAMED_DATA_MAP) | |
| 172 | +| `EXECUTORCH_BUILD_EXTENSION_LLM` | LLM extension | |
| 173 | +| `EXECUTORCH_BUILD_TESTS` | Unit tests (`ctest --test-dir cmake-out --output-on-failure`) | |
| 174 | +| `EXECUTORCH_BUILD_DEVTOOLS` | DevTools (Inspector, ETDump) | |
| 175 | +| `EXECUTORCH_OPTIMIZE_SIZE` | Size-optimized build (`-Os`, no exceptions/RTTI) | |
| 176 | +| `CMAKE_BUILD_TYPE` | `Release` or `Debug` (5-10x slower). Some presets (e.g. `llm-release`) set this; others require it explicitly. | |
| 177 | + |
| 178 | +## Troubleshooting |
| 179 | + |
| 180 | +| Symptom | Fix | |
| 181 | +|---------|-----| |
| 182 | +| Missing headers / `CMakeLists.txt not found` in third-party | `git submodule sync --recursive && git submodule update --init --recursive` | |
| 183 | +| Mysterious failures after `git pull` or branch switch | `rm -rf cmake-out/ pip-out/ && git submodule sync && git submodule update --init --recursive` | |
| 184 | +| `conda env list` PermissionError | Use `CONDA_NO_PLUGINS=true conda env list` or check env dir directly | |
| 185 | +| CMake >= 4.0 | Works in practice despite `< 4.0` in docs; only fix if build actually fails | |
| 186 | +| `externally-managed-environment` / PEP 668 error | You're using system Python, not conda. Activate conda env first. | |
| 187 | +| pip conflicts with torch versions | Fresh conda env; or `./install_executorch.sh --use-pt-pinned-commit` | |
| 188 | +| Missing `Python.h` (Linux) | `sudo apt install python3.X-dev` | |
| 189 | +| Missing operator registrations at runtime | Link kernel libs with `-Wl,-force_load,<lib>` (macOS) or `-Wl,--whole-archive <lib> -Wl,--no-whole-archive` (Linux) | |
| 190 | +| `install_executorch.sh` fails on Intel Mac | No prebuilt PyTorch wheels; use `--use-pt-pinned-commit --minimal` | |
| 191 | +| XNNPACK build errors about cpuinfo/pthreadpool | Ensure `EXECUTORCH_BUILD_CPUINFO=ON` and `EXECUTORCH_BUILD_PTHREADPOOL=ON` (both ON by default) | |
| 192 | +| Duplicate kernel registration abort | Only link one `gen_operators_lib` per target | |
| 193 | + |
| 194 | +## Build output |
| 195 | + |
| 196 | +**From `./install_executorch.sh` (Python package):** |
| 197 | + |
| 198 | +| Artifact | Location | |
| 199 | +|----------|----------| |
| 200 | +| Python package | `site-packages/executorch` | |
| 201 | + |
| 202 | +**From CMake builds** (`cmake --install` with `CMAKE_INSTALL_PREFIX=cmake-out`): |
| 203 | + |
| 204 | +| Artifact | Location | |
| 205 | +|----------|----------| |
| 206 | +| Core runtime | `cmake-out/lib/libexecutorch.a` | |
| 207 | +| XNNPACK backend | `cmake-out/lib/libxnnpack_backend.a` | |
| 208 | +| executor_runner | `cmake-out/executor_runner` (Ninja/Make) or `cmake-out/Release/executor_runner` (Xcode) | |
| 209 | +| Model runners | `cmake-out/examples/models/<model>/<runner>` | |
| 210 | + |
| 211 | +**From cross-compilation:** |
| 212 | + |
| 213 | +| Artifact | Location | |
| 214 | +|----------|----------| |
| 215 | +| iOS frameworks | `cmake-out/*.xcframework` | |
| 216 | +| Android AAR | `aar-out/` | |
| 217 | + |
| 218 | +## Tips |
| 219 | +- Always use `Release` for benchmarking; `Debug` is 5–10x slower |
| 220 | +- `ccache` is auto-detected if installed (`brew install ccache`) |
| 221 | +- `Ninja` is faster than Make (`-G Ninja`) — but `--preset macos` uses Xcode generator |
| 222 | +- For LLM workflows, `make <model>-<backend>` is the simplest path |
| 223 | +- After `git pull`, clean and re-init submodules before rebuilding |
0 commit comments