Skip to content

Conda install cupy CUDA Runtime Version (locally installed) failed to load cudart64_12.dll #9127

Open
@desplenterkarel

Description

@desplenterkarel

Description

When installing using conda cupy fails to load the cudart64_12.dll for the local installed runtime version.

Debugging my self i found that the DLL bin path Adding DLL search path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2 misses the \bin part.
This is normal as line 210 of file cupy/cupy/_environment.py states that using conda the cuda_path should be used.
This path doesn't have the \bin part. [https://github.com/cupy/cupy/blob/66820586ee1c41013868a8de4977c84f29180bc8/cupy/_environment.py#L210]

The question is do i have something setup wrong with my paths?
And how can i fix the install?
My fix for the moment is to add os.add_dll_directory(r'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin') to every cupy script.

To Reproduce

initial output

import cupy
cupy.show_config()

OS : Windows-10-10.0.19045-SP0
Python Version : 3.12.8
CuPy Version : 13.4.1
CuPy Platform : NVIDIA CUDA
NumPy Version : 1.26.4
SciPy Version : 1.13.1
Cython Build Version : 3.0.12
Cython Runtime Version : None
CUDA Root : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
nvcc PATH : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.EXE
CUDA Build Version : 12080
CUDA Driver Version : 12020
CUDA Runtime Version : 12080 (linked to CuPy) / RuntimeError("CuPy failed to load cudart64_12.dll: FileNotFoundError: Could not find module 'cudart64_12.dll' (or one of its dependencies). Try using the full path with constructor syntax.") (locally installed)
CUDA Extra Include Dirs : ['C:\Users\kedsplen\.conda\envs\Project_oorbel\Library\include']
cuBLAS Version : 120205
cuFFT Version : 11008
cuRAND Version : 10303
cuSOLVER Version : (11, 5, 2)
cuSPARSE Version : 12102
NVRTC Version : (12, 2)
Thrust Version : 200800
CUB Build Version : 200800
Jitify Build Version :
cuDNN Build Version : None
cuDNN Version : None
NCCL Build Version : None
NCCL Runtime Version : None
cuTENSOR Version : None
cuSPARSELt Build Version : None
Device 0 Name : NVIDIA A40-48Q
Device 0 Compute Capability : 86
Device 0 PCI Bus ID : 0000:06:10.0

fixed output

import cupy
import os
os.add_dll_directory(r'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin')
cupy.show_config()

OS : Windows-10-10.0.19045-SP0
Python Version : 3.12.8
CuPy Version : 13.4.1
CuPy Platform : NVIDIA CUDA
NumPy Version : 1.26.4
SciPy Version : 1.13.1
Cython Build Version : 3.0.12
Cython Runtime Version : None
CUDA Root : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
nvcc PATH : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.EXE
CUDA Build Version : 12080
CUDA Driver Version : 12020
CUDA Runtime Version : 12080 (linked to CuPy) / 12020 (locally installed)
CUDA Extra Include Dirs : ['C:\Users\kedsplen\.conda\envs\Project_oorbel\Library\include']
cuBLAS Version : 120205
cuFFT Version : 11008
cuRAND Version : 10303
cuSOLVER Version : (11, 5, 2)
cuSPARSE Version : 12102
NVRTC Version : (12, 2)
Thrust Version : 200800
CUB Build Version : 200800
Jitify Build Version :
cuDNN Build Version : None
cuDNN Version : None
NCCL Build Version : None
NCCL Runtime Version : None
cuTENSOR Version : None
cuSPARSELt Build Version : None
Device 0 Name : NVIDIA A40-48Q
Device 0 Compute Capability : 86
Device 0 PCI Bus ID : 0000:06:10.0

Installation

Conda-Forge (conda install ...)

Environment

name: Project_oorbel
channels:
  - https://software.repos.intel.com/python/conda
  - ccpi
  - conda-forge
  - defaults
dependencies:
  - cuda-python == 12.2.1
  - cuda-version == 12.2
  - cupy
```


```
>>> import cupy
[CUPY_DEBUG_LIBRARY_LOAD] CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
[CUPY_DEBUG_LIBRARY_LOAD] Not wheel distribution (C:\Users\kedsplen\.conda\envs\Project_oorbel\Lib\site-packages\cupy\.data\lib not found)
[CUPY_DEBUG_LIBRARY_LOAD] Adding DLL search path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
[CUPY_DEBUG_LIBRARY_LOAD] Preloading triggered for library: cutensor
[CUPY_DEBUG_LIBRARY_LOAD] Not preloading cutensor as this is not a pip wheel installation
>>> cupy.show_config(_full=True)
[CUPY_DEBUG_LIBRARY_LOAD] Library "cusparse64_12.dll" loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpSM_createDescr): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpSM_destroyDescr): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpSM_bufferSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpSM_analysis): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpSM_solve): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpMatSetAttribute): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseCreateCsc): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSparseToDense_bufferSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSparseToDense): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseDenseToSparse_bufferSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseDenseToSparse_analysis): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseDenseToSparse_convert): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] Library "nvrtc64_120_0.dll" loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetErrorString): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcVersion): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcCreateProgram): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcDestroyProgram): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcCompileProgram): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetPTXSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetPTX): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetCUBINSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetCUBIN): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetProgramLogSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetProgramLog): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcAddNameExpression): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetLoweredName): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetNumSupportedArchs): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetSupportedArchs): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetNVVMSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetNVVM): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] Not preloading cudnn as this is not a pip wheel installation
[CUPY_DEBUG_LIBRARY_LOAD] Not preloading nccl as this is not a pip wheel installation
OS                           : Windows-10-10.0.19045-SP0
Python Version               : 3.12.8
CuPy Version                 : 13.4.1
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.26.4
SciPy Version                : 1.13.1
Cython Build Version         : 3.0.12
Cython Runtime Version       : None
CUDA Root                    : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
nvcc PATH                    : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.EXE
CUDA Build Version           : 12080
CUDA Driver Version          : 12020
CUDA Runtime Version         : 12080 (linked to CuPy) / RuntimeError("CuPy failed to load cudart64_12.dll: FileNotFoundError: Could not find module 'cudart64_12.dll' (or one of its dependencies). Try using the full path with constructor syntax.") (locally installed)
CUDA Extra Include Dirs      : ['C:\\Users\\kedsplen\\.conda\\envs\\Project_oorbel\\Library\\include']
cuBLAS Version               : 120205
cuFFT Version                : 11008
cuRAND Version               : 10303
cuSOLVER Version             : (11, 5, 2)
cuSPARSE Version             : 12102
NVRTC Version                : (12, 2)
Thrust Version               : 200800
CUB Build Version            : 200800
Jitify Build Version         : <unknown>
cuDNN Build Version          : None
cuDNN Version                : None
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA A40-48Q
Device 0 Compute Capability  : 86
Device 0 PCI Bus ID          : 0000:06:10.0
```


### Additional Information

_No response_

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions