This workspace automates building stock PyTorch with CUDA enabled for Python 3.8–3.12 on both Jetson Orin (Ampere, JetPack 6.x) and Jetson AGX Thor (Blackwell, JetPack 7.x). It codifies the upstream PyTorch instructions and the NVIDIA Developer Forum guidance for Orin builds and Thor/JetPack 7 builds. NVIDIA 官方 pip 仓库只提供少量预编译版本,这里可以自编译带 CUDA 的版本以匹配需求。
-
JetPack 6.x (Orin) or JetPack 7.x (Thor) with CUDA
/usr/local/cudaand cuDNN already installed. -
At least 32 GB of free disk (more if you plan to keep all 3 wheels at once) and large swap (builds routinely spill >16 GB RAM).
-
System packages:
sudo apt update sudo apt install -y build-essential git cmake ninja-build \ libopenblas-dev libopenmpi-dev openmpi-bin libatlas-base-dev libprotobuf-dev \ protobuf-compiler libssl-dev zlib1g-dev libffi-dev -
~/miniconda3(already present on this machine) or any conda distribution. The scripts will create isolated envs per Python version.
Thor-specific sanity checks (taken from this devkit, JetPack 7.0 / Ubuntu 24.04):
uname -a
# Linux thor-taco 6.8.12-tegra ... aarch64 GNU/Linux
cat /etc/nv_tegra_release
# R38.2.2 ... BOARD: generic (AGX Thor)
nvidia-smi --query-gpu=name,compute_cap,driver_version,cuda_version --format=csv
# NVIDIA Thor, 11.0, 580.00, 13.0The build.sh script auto-detects the compute capability (TORCH_CUDA_ARCH_LIST) via nvidia-smi when present, falling back to /proc/device-tree/model. On this Thor devkit it resolves to 11.0; on Orin it defaults to 8.7.
ℹ️ Jetson builds cannot currently use NVIDIA's binary NCCL. Following the forum advice above, the scripts default to
USE_NCCL=0,USE_DISTRIBUTED=0,USE_MKLDNN=0, andUSE_NNPACK=0. Override them if you have working alternatives.
build.sh— clones PyTorch (once), prepares the requested Python env, and runspython setup.py bdist_wheelwith Jetson-friendly defaults (auto-detected CUDA arch, NCCL disabled unless you opt in, etc.).build-all.sh— convenience wrapper that invokesbuild.shfor 3.10, 3.11, and 3.12 (or any list of versions you pass).src/— source tree managed by the scripts (src/pytorchis the git checkout).logs/— timestamped build logs per Python version.wheels/— collected.whlartefacts per Python version (wheels/py310,wheels/py311, ...).
cd ~/jetson-pytorch-builder
chmod +x build.sh build-all.sh
# Build all supported versions (3.8–3.12)
./build-all.sh
# OR build one at a time
# ^ Python version ^ optional PyTorch git ref/tag
./build.sh 3.11 v2.4.1Each run:
- Creates/updates
src/pytorch(defaults to upstream tagv2.4.0, override withPYTORCH_BRANCH=<tag>or pass a second argument such as./build.sh 3.12 main). - Creates a matching conda env (
torch-py310,torch-py311,torch-py312) if it does not exist yet. - Installs PyTorch's Python build requirements into the env.
- Cleans the repo tree (
git clean -fdx) to avoid cross-version contamination. - Compiles PyTorch with CUDA enabled, targeting the detected GPU (
TORCH_CUDA_ARCH_LISTauto-detects to 8.7 for Orin, 11.0 for Thor; override via env var to cross-compile). - Copies the newest
torch-*.whlintowheels/pyNNN.
Successful builds print the wheel path at the end and log everything to logs/pytorch-py<version>-<timestamp>.log.
Note: Marked is tested versions. Some may work but not tested/updated yet.
-
Pytorch 2.4.0
-
Pytorch 2.9.1 (partial)
-
- Python 3.8
- Python 3.9
- Python 3.10
- Python 3.11
- Python 3.12
-
- Python 3.8
- Python 3.9
- Python 3.10
- Python 3.11
- Python 3.12
All relevant knobs can be changed through environment variables:
| Variable | Default | Meaning |
|---|---|---|
PYTORCH_BRANCH |
v2.4.0 |
Upstream tag/branch to checkout (can also pass as the second argument to build.sh). |
PYTORCH_REPO |
https://github.com/pytorch/pytorch.git |
Clone source. |
TORCH_CUDA_ARCH_LIST |
auto (11.0 on Thor, 8.7 on Orin) |
Target GPU architectures. Override to cross-compile. |
MAX_JOBS |
$(nproc) |
Parallel compilation jobs. Tune to control RAM usage. |
CUDA_HOME |
/usr/local/cuda |
CUDA root. |
USE_NCCL, USE_DISTRIBUTED, USE_MKLDNN, USE_NNPACK, USE_QNNPACK |
Jetson defaults set in build.sh. |
|
TORCH_VERSION_OVERRIDE |
auto from tag (e.g.,v2.4.0 → 2.4.0) |
Forces TORCH_BUILD_VERSION so the wheel filename/metadata advertises your custom build. Set empty to keep upstream git-style versions or to supply your own (e.g., 2.4.0-jetson.1). |
TORCH_BUILD_NUMBER_OVERRIDE |
1 |
Optional build number passed along when TORCH_VERSION_OVERRIDE is set. |
Example:
TORCH_CUDA_ARCH_LIST="8.7;8.9" USE_NCCL=1 MAX_JOBS=8 ./build.sh 3.12Once a build finishes, install it inside any target environment (conda, system Python, etc.):
pip install ~/jetson-pytorch-builder/wheels/py312/torch-*.whlCopy the wheel to other Jetson nodes as needed. Keep the logs handy for support/bug reports.
By default PyTorch's build system emits versions like 2.4.0a0+git<sha>. This repo now auto-sets TORCH_BUILD_VERSION to the numeric part of your tag (e.g., v2.4.0 → 2.4.0), so the wheel name/metadata matches what torchvision/torchaudio expect. For non-tag refs (e.g., main), no override is applied unless you set it explicitly.
Two ways to stay sane:
-
Set an explicit version for your wheel.
TORCH_VERSION_OVERRIDE="2.4.0-jetson.1" ./build.sh 3.11 v2.4.0The resulting wheel becomes
torch-2.4.0-jetson.1-...whl, making it easy to match dependencies. -
Install torchvision without re-resolving torch.
If you keep the default
2.4.0a0+git...version, install the matching source release and skip dependency checks:pip install torchvision==0.19.0 --no-deps pip install torchaudio==2.4.0 --no-deps # adjust to the PyTorch series you builtThis mirrors the PyTorch instructions for source builds where
torchis already present.
- JetPack 7 ships CUDA 13.0 and driver 580; make sure host packages and
CUDA_HOMEpoint to/usr/local/cuda-13.0(symlinked by default). The script logs the resolved path for traceability. - Blackwell support in PyTorch is still evolving; stick to PyTorch v2.4+ (default
v2.4.0) or nightly master for propersm_110kernels. You can changePYTORCH_BRANCHtomainwhen you need bleeding-edge fixes. - NCCL is still unavailable on Jetson, so distributed training remains disabled.
- If you parallelize with
MAX_JOBS > 8, ensure Thor's LPDDR memory controller has enough headroom or the build may thrash swap.
- Add swap with
sudo fallocate -l 32G /swapfile && sudo mkswap /swapfile ...if the compiler OOMs. - Ensure
nvcc --versionmatches your JetPack CUDA (nvcc --versionshould report 13.0 on Thor, 12.x on Orin). If not, exportCUDA_HOMEexplicitly. python setup.py cleanis implicitly handled bygit clean -fdx; removebuild/manually if you pause/resume by hand.- Refer to the PyTorch source build doc and NVIDIA forum threads listed at the top for more edge-case fixes (e.g., building with TensorRT, CUTLASS tuning, FlashAttention patches, etc.).
Happy compiling!
Special notes: This repo contains codes generated by Codex GPT-5.1 medium. I do not garentee this will work on your machine. Merge requests welcomed. Tested on: Jetson Thor: Linux thor-taco 6.8.12-tegra #1 SMP PREEMPT Thu Sep 25 15:19:42 PDT 2025 aarch64 aarch64 aarch64 GNU/Linux Soc: tegra264 CUDA Arch BIN: 13.0 L4T: 38.2.2 Jetpack: 7.0 CUDA: 13.0.48 cuDNN: 9.12.0 TensorRT: 10.13.3.9