-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Submitting and closing, to help anyone else searching for how to solve this. Including my error message as that is where I was stuck with no results found on the web.
I have also captured an exact step by step in this ReadMe: https://github.com/DavidBurela/edgellm#edgellm
Install CUDA toolkit
You need to ensure you have the CUDA toolkit installed. as you need nvcc
etc in your path, to correctly compile when you install via
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
Ensure you install the correct version of CUDA toolkit
When I installed with cuBLAS support and tried to run, I would get this error
the provided PTX was compiled with an unsupported toolchain.
I was able to pin the root cause down to the CUDA Toolkit version being installed, was newer than what my GPU Drivers supported.
Run nvidia-smi
, and note what version of CUDA is supported in the top right.
Here my GPU drivers support 12.0, so I can install CUDA toolkit 12.0.1
Download & install the correct version
Direct download and install
https://developer.nvidia.com/cuda-toolkit-archive
Conda
If you are using Conda you can also download it directly into your environment
conda create -n condaexample python=3.11 #enter later python version if needed
conda activate condaexample
# Full list at https://anaconda.org/nvidia/cuda-toolkit
conda install -c "nvidia/label/cuda-12.1.1" cuda-toolkit
Enable in code
# CPU only
model = LlamaCpp(model_path="./models/model.bin", verbose=True, n_threads=8)
# GPU. Must specify number of layers to load into VRAM
model = LlamaCpp(model_path="./models/model.bin", verbose=True, n_threads=8, n_gpu_layers=20)