Skip to content

Commit ee1628b

Browse files
0cc4mslaren
andauthored
Basic Vulkan Multi-GPU implementation (ggml-org#5321)
* Initial Vulkan multi-gpu implementation Move most global variables into backend context * Add names to backend device functions * Add further missing cleanup code * Reduce code duplication in tensor split layer assignment * generalize LLAMA_SPLIT_LAYER for all backends, do not expose device count and memory in llama.h * Only do device info print in the beginning and initialize one backend for cpu assist Add missing cleanup code * Rework backend memory management to make sure devices and buffers get properly allocated and freed * Rename cpu assist free function --------- Co-authored-by: slaren <[email protected]>
1 parent ed0bf32 commit ee1628b

File tree

5 files changed

+1589
-1168
lines changed

5 files changed

+1589
-1168
lines changed

common/common.cpp

+6-2
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,10 @@
4646
#define GGML_USE_CUBLAS_SYCL
4747
#endif
4848

49+
#if (defined(GGML_USE_CUBLAS) || defined(GGML_USE_SYCL)) || defined(GGML_USE_VULKAN)
50+
#define GGML_USE_CUBLAS_SYCL_VULKAN
51+
#endif
52+
4953
int32_t get_num_physical_cores() {
5054
#ifdef __linux__
5155
// enumerate the set of thread siblings, num entries is num cores
@@ -660,8 +664,8 @@ bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params) {
660664
params.tensor_split[i] = 0.0f;
661665
}
662666
}
663-
#ifndef GGML_USE_CUBLAS_SYCL
664-
fprintf(stderr, "warning: llama.cpp was compiled without cuBLAS/SYCL. Setting a tensor split has no effect.\n");
667+
#ifndef GGML_USE_CUBLAS_SYCL_VULKAN
668+
fprintf(stderr, "warning: llama.cpp was compiled without cuBLAS/SYCL/Vulkan. Setting a tensor split has no effect.\n");
665669
#endif // GGML_USE_CUBLAS_SYCL
666670
} else if (arg == "--no-mmap") {
667671
params.use_mmap = false;

0 commit comments

Comments
 (0)