We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent c2e3b35 commit cb91ec8Copy full SHA for cb91ec8
include/llama.h
@@ -267,9 +267,9 @@ extern "C" {
267
enum llama_split_mode split_mode; // how to split the model across multiple GPUs
268
269
// main_gpu interpretation depends on split_mode:
270
- // LLAMA_SPLIT_NONE: the GPU that is used for the entire model
271
- // LLAMA_SPLIT_ROW: the GPU that is used for small tensors and intermediate results
272
- // LLAMA_SPLIT_LAYER: ignored
+ // LLAMA_SPLIT_MODE_NONE: the GPU that is used for the entire model
+ // LLAMA_SPLIT_MODE_ROW: the GPU that is used for small tensors and intermediate results
+ // LLAMA_SPLIT_MODE_LAYER: ignored
273
int32_t main_gpu;
274
275
// proportion of the model (layers or rows) to offload to each GPU, size: llama_max_devices()
0 commit comments