Skip to content

Conversation

0cc4m
Copy link
Collaborator

@0cc4m 0cc4m commented Sep 12, 2025

This changes the behaviour of the Vulkan backend to show iGPUs and dGPUs by default. The new iGPU device type (added in #15797) is used for iGPUs and the logic defaulting to dGPUs if available is handed over to the GGML API. (FYI @slaren)

Additionally, the PCI ID is returned and can be used by GGML to avoid using the same device once with CUDA and once with Vulkan, for example. This should improve the behaviour of binaries with many backends included.

I also fixed the compiler warning about missing brackets for operator precedence.

@0cc4m 0cc4m requested a review from jeffbolznv September 12, 2025 07:41
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Sep 12, 2025
@slaren
Copy link
Member

slaren commented Sep 12, 2025

The device id filtering work as expected for me under Windows.

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
  Device 1: NVIDIA GeForce RTX 3080, compute capability 8.6, VMM: yes
ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 3080 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 1 = NVIDIA GeForce RTX 5090 (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
build: 6456 (f22656bbd) with clang version 18.1.8 for x86_64-pc-windows-msvc
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: skipping device Vulkan0 (NVIDIA GeForce RTX 3080) with id 0000:01:00.0 - already using device CUDA1 (NVIDIA GeForce RTX 3080) with the same id
llama_model_load_from_file_impl: skipping device Vulkan1 (NVIDIA GeForce RTX 5090) with id 0000:02:00.0 - already using device CUDA0 (NVIDIA GeForce RTX 5090) with the same id
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 5090) (0000:02:00.0) - 30840 MiB free
llama_model_load_from_file_impl: using device CUDA1 (NVIDIA GeForce RTX 3080) (0000:01:00.0) - 9071 MiB free

@0cc4m
Copy link
Collaborator Author

0cc4m commented Sep 12, 2025

Currently, this is just based on which backend gets loaded first, right? We might wanna set an explicit order.

@slaren
Copy link
Member

slaren commented Sep 12, 2025

Yes, the backends loaded first take precedence. I think it works ok at the moment, the backends are always loaded in the same order so it is predictable.

@0cc4m 0cc4m merged commit 304ac56 into master Sep 12, 2025
46 of 48 checks passed
@0cc4m 0cc4m deleted the 0cc4m/vulkan-igpu-pci-id branch September 12, 2025 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants