Fix CUDA build failure on AutoDL cloud platforms #14005

pockers21 · 2025-06-04T10:26:11Z

Problem

As reported in issue #13893, the current CUDA build configuration in ci/run.sh fails on AutoDL cloud machines:

if [ ! -z ${GG_BUILD_CUDA} ]; then
    CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=native"
fi

Root Cause

The issue appears to be that CMAKE_CUDA_ARCHITECTURES=native fails on cloud platforms,
likely due to restricted GPU introspection capabilities.

Reproduction

This can be reproduced with a minimal test script:

#!/bin/bash

mkdir cmake_arch_test && cd cmake_arch_test

cat << 'EOF' > CMakeLists.txt
cmake_minimum_required(VERSION 3.18)
project(test_cuda_arch LANGUAGES CUDA)

message(STATUS "CMAKE_CUDA_ARCHITECTURES: ${CMAKE_CUDA_ARCHITECTURES}")
message(STATUS "CMAKE_CUDA_COMPILER_TOOLKIT_ROOT: ${CMAKE_CUDA_COMPILER_TOOLKIT_ROOT}")

add_executable(test_cuda test.cu)
EOF

cat << 'EOF' > test.cu
int main() { return 0; }
EOF

rm -rf build && mkdir build && cd build
cmake -DCMAKE_CUDA_ARCHITECTURES=native .. 2>&1 | grep -E "(CMAKE_CUDA_ARCHITECTURES|architecture|fatal|error)"

This script produces the nvcc fatal : Unsupported gpu architecture 'compute_' error on AutoDL machines, but works fine on physical machines.

Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection as 'native' fails on autodl cloud environments.

pockers21 · 2025-06-05T13:02:07Z

@ggerganov Hi! This PR has been waiting for review for a while. Could you please take a look when you have time?

Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection as 'native' fails on autodl cloud environments. Co-authored-by: pockers21 <[email protected]>

ci: fix CUDA build failure on autodl cloud machines

e27c80a

Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection as 'native' fails on autodl cloud environments.

pockers21 requested a review from ggerganov as a code owner June 4, 2025 10:26

github-actions bot added the devops improvements to build systems and github actions label Jun 4, 2025

ggerganov approved these changes Jun 5, 2025

View reviewed changes

ggerganov merged commit 146b88e into ggml-org:master Jun 5, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix CUDA build failure on AutoDL cloud platforms #14005

Fix CUDA build failure on AutoDL cloud platforms #14005

Uh oh!

pockers21 commented Jun 4, 2025 •

edited

Loading

Uh oh!

pockers21 commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

Fix CUDA build failure on AutoDL cloud platforms #14005

Fix CUDA build failure on AutoDL cloud platforms #14005

Uh oh!

Conversation

pockers21 commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Reproduction

Uh oh!

pockers21 commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

pockers21 commented Jun 4, 2025 •

edited

Loading