Skip to content

Fix CUDA build failure on AutoDL cloud platforms #14005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 5, 2025

Conversation

pockers21
Copy link
Contributor

@pockers21 pockers21 commented Jun 4, 2025

Problem

As reported in issue #13893, the current CUDA build configuration in ci/run.sh fails on AutoDL cloud machines:

if [ ! -z ${GG_BUILD_CUDA} ]; then
    CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=native"
fi

Root Cause

The issue appears to be that CMAKE_CUDA_ARCHITECTURES=native fails on cloud platforms,
likely due to restricted GPU introspection capabilities.

Reproduction

This can be reproduced with a minimal test script:

#!/bin/bash

mkdir cmake_arch_test && cd cmake_arch_test

cat << 'EOF' > CMakeLists.txt
cmake_minimum_required(VERSION 3.18)
project(test_cuda_arch LANGUAGES CUDA)

message(STATUS "CMAKE_CUDA_ARCHITECTURES: ${CMAKE_CUDA_ARCHITECTURES}")
message(STATUS "CMAKE_CUDA_COMPILER_TOOLKIT_ROOT: ${CMAKE_CUDA_COMPILER_TOOLKIT_ROOT}")

add_executable(test_cuda test.cu)
EOF

cat << 'EOF' > test.cu
int main() { return 0; }
EOF

rm -rf build && mkdir build && cd build
cmake -DCMAKE_CUDA_ARCHITECTURES=native .. 2>&1 | grep -E "(CMAKE_CUDA_ARCHITECTURES|architecture|fatal|error)"

This script produces the nvcc fatal : Unsupported gpu architecture 'compute_' error on AutoDL machines, but works fine on physical machines.

Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection
as 'native' fails on autodl cloud environments.
@pockers21 pockers21 requested a review from ggerganov as a code owner June 4, 2025 10:26
@github-actions github-actions bot added the devops improvements to build systems and github actions label Jun 4, 2025
@pockers21
Copy link
Contributor Author

@ggerganov Hi! This PR has been waiting for review for a while. Could you please take a look when you have time?

@ggerganov ggerganov merged commit 146b88e into ggml-org:master Jun 5, 2025
2 checks passed
furyhawk pushed a commit to furyhawk/llama.cpp that referenced this pull request Jun 6, 2025
Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection
as 'native' fails on autodl cloud environments.

Co-authored-by: pockers21 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants