vulkan: initialize devices properly for LLAMA_SPLIT_MODE_NONE #7552

Adriankhl · 2024-05-27T02:38:37Z

Before the change, split mode "none" doesn't work for vulkan because it is not properly initialized before offload:

.\bin\main.exe -m "C:\Users\adriankhl\git\models\Meta-Llama-3-8B-Instruct.Q5_K_M.gguf" --prompt "hello" -sm "none"

gives an error.

This is because vulkan backend is initialized here
https://github.com/ggerganov/llama.cpp/blob/d6ef0e77dd25f54fb5856af47e3926cf6f36c281/ggml-vulkan.cpp#L6014-L6024
which is being called in llama_get_device_count, and llama_get_device_count is not being called when split mode is "none"

~~This PR make llm_load_tensors to call llama_get_device_count even when split mode is "none".~~

~~Something worth thinking about, is relying on llama_get_device_count to initialize vulkan backend a good idea? Feels like this is a bit cryptic.~~

github-actions · 2024-05-27T04:15:22Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 553 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8457.35ms p(95)=21491.72ms fails=, finish reason: stop=507 truncated=46
Prompt processing (pp): avg=101.31tk/s p(95)=464.1tk/s
Token generation (tg): avg=35.07tk/s p(95)=50.48tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=fix_vulkan_device commit=bd00902cda298ef8a595ca78eb6360546a010263

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 553 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1716782690 --> 1716783316
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 694.2, 694.2, 694.2, 694.2, 694.2, 920.43, 920.43, 920.43, 920.43, 920.43, 900.94, 900.94, 900.94, 900.94, 900.94, 889.77, 889.77, 889.77, 889.77, 889.77, 917.0, 917.0, 917.0, 917.0, 917.0, 912.9, 912.9, 912.9, 912.9, 912.9, 909.83, 909.83, 909.83, 909.83, 909.83, 924.66, 924.66, 924.66, 924.66, 924.66, 932.59, 932.59, 932.59, 932.59, 932.59, 932.8, 932.8, 932.8, 932.8, 932.8, 972.77, 972.77, 972.77, 972.77, 972.77, 991.77, 991.77, 991.77, 991.77, 991.77, 1016.21, 1016.21, 1016.21, 1016.21, 1016.21, 1017.49, 1017.49, 1017.49, 1017.49, 1017.49, 1012.5, 1012.5, 1012.5, 1012.5, 1012.5, 1007.99, 1007.99, 1007.99, 1007.99, 1007.99, 1003.03, 1003.03, 1003.03, 1003.03, 1003.03, 988.57, 988.57, 988.57, 988.57, 988.57, 984.36, 984.36, 984.36, 984.36, 984.36, 978.89, 978.89, 978.89, 978.89, 978.89, 980.53, 980.53, 980.53, 980.53, 980.53, 981.45, 981.45, 981.45, 981.45, 981.45, 978.66, 978.66, 978.66, 978.66, 978.66, 980.54, 980.54, 980.54, 980.54, 980.54, 980.44, 980.44, 980.44, 980.44, 980.44, 977.45, 977.45, 977.45, 977.45, 977.45, 958.53, 958.53, 958.53, 958.53, 958.53, 954.48, 954.48, 954.48, 954.48, 954.48, 947.5, 947.5, 947.5, 947.5, 947.5, 946.46, 946.46, 946.46, 946.46, 946.46, 949.87, 949.87, 949.87, 949.87, 949.87, 947.38, 947.38, 947.38, 947.38, 947.38, 949.46, 949.46, 949.46, 949.46, 949.46, 955.28, 955.28, 955.28, 955.28, 955.28, 948.77, 948.77, 948.77, 948.77, 948.77, 953.5, 953.5, 953.5, 953.5, 953.5, 941.37, 941.37, 941.37, 941.37, 941.37, 939.71, 939.71, 939.71, 939.71, 939.71, 940.45, 940.45, 940.45, 940.45, 940.45, 940.19, 940.19, 940.19, 940.19, 940.19, 908.82, 908.82, 908.82, 908.82, 908.82, 909.71, 909.71, 909.71, 909.71, 909.71, 901.08, 901.08, 901.08, 901.08, 901.08, 898.28, 898.28, 898.28, 898.28, 898.28, 896.74, 896.74, 896.74, 896.74, 896.74, 899.44, 899.44, 899.44, 899.44, 899.44, 899.39, 899.39, 899.39, 899.39, 899.39, 897.58, 897.58, 897.58, 897.58, 897.58, 901.87, 901.87, 901.87, 901.87, 901.87, 900.38, 900.38, 900.38, 900.38, 900.38, 901.87, 901.87, 901.87, 901.87, 901.87, 904.26, 904.26, 904.26, 904.26, 904.26, 902.92, 902.92, 902.92, 902.92, 902.92, 892.29, 892.29, 892.29, 892.29, 892.29, 893.53, 893.53, 893.53, 893.53, 893.53, 892.68, 892.68, 892.68, 892.68, 892.68, 892.12, 892.12, 892.12, 892.12, 892.12, 890.96, 890.96, 890.96, 890.96, 890.96, 892.41, 892.41, 892.41, 892.41, 892.41, 894.82, 894.82, 894.82, 894.82, 894.82, 894.59, 894.59, 894.59, 894.59]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 553 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1716782690 --> 1716783316
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.43, 41.43, 41.43, 41.43, 41.43, 33.67, 33.67, 33.67, 33.67, 33.67, 28.37, 28.37, 28.37, 28.37, 28.37, 31.16, 31.16, 31.16, 31.16, 31.16, 32.15, 32.15, 32.15, 32.15, 32.15, 32.65, 32.65, 32.65, 32.65, 32.65, 33.32, 33.32, 33.32, 33.32, 33.32, 33.95, 33.95, 33.95, 33.95, 33.95, 34.35, 34.35, 34.35, 34.35, 34.35, 34.77, 34.77, 34.77, 34.77, 34.77, 34.93, 34.93, 34.93, 34.93, 34.93, 34.43, 34.43, 34.43, 34.43, 34.43, 34.42, 34.42, 34.42, 34.42, 34.42, 33.36, 33.36, 33.36, 33.36, 33.36, 30.96, 30.96, 30.96, 30.96, 30.96, 30.46, 30.46, 30.46, 30.46, 30.46, 30.49, 30.49, 30.49, 30.49, 30.49, 30.89, 30.89, 30.89, 30.89, 30.89, 30.61, 30.61, 30.61, 30.61, 30.61, 30.08, 30.08, 30.08, 30.08, 30.08, 30.02, 30.02, 30.02, 30.02, 30.02, 30.06, 30.06, 30.06, 30.06, 30.06, 30.36, 30.36, 30.36, 30.36, 30.36, 30.28, 30.28, 30.28, 30.28, 30.28, 30.32, 30.32, 30.32, 30.32, 30.32, 30.41, 30.41, 30.41, 30.41, 30.41, 30.66, 30.66, 30.66, 30.66, 30.66, 30.4, 30.4, 30.4, 30.4, 30.4, 30.4, 30.4, 30.4, 30.4, 30.4, 30.74, 30.74, 30.74, 30.74, 30.74, 30.73, 30.73, 30.73, 30.73, 30.73, 30.81, 30.81, 30.81, 30.81, 30.81, 31.02, 31.02, 31.02, 31.02, 31.02, 31.16, 31.16, 31.16, 31.16, 31.16, 31.09, 31.09, 31.09, 31.09, 31.09, 30.91, 30.91, 30.91, 30.91, 30.91, 30.68, 30.68, 30.68, 30.68, 30.68, 30.89, 30.89, 30.89, 30.89, 30.89, 31.11, 31.11, 31.11, 31.11, 31.11, 31.21, 31.21, 31.21, 31.21, 31.21, 31.36, 31.36, 31.36, 31.36, 31.36, 31.1, 31.1, 31.1, 31.1, 31.1, 30.84, 30.84, 30.84, 30.84, 30.84, 30.49, 30.49, 30.49, 30.49, 30.49, 29.59, 29.59, 29.59, 29.59, 29.59, 29.39, 29.39, 29.39, 29.39, 29.39, 29.35, 29.35, 29.35, 29.35, 29.35, 29.29, 29.29, 29.29, 29.29, 29.29, 29.32, 29.32, 29.32, 29.32, 29.32, 29.27, 29.27, 29.27, 29.27, 29.27, 29.31, 29.31, 29.31, 29.31, 29.31, 29.32, 29.32, 29.32, 29.32, 29.32, 29.26, 29.26, 29.26, 29.26, 29.26, 29.3, 29.3, 29.3, 29.3, 29.3, 29.25, 29.25, 29.25, 29.25, 29.25, 29.42, 29.42, 29.42, 29.42, 29.42, 29.54, 29.54, 29.54, 29.54, 29.54, 29.64, 29.64, 29.64, 29.64, 29.64, 29.79, 29.79, 29.79, 29.79, 29.79, 29.78, 29.78, 29.78, 29.78, 29.78, 29.84, 29.84, 29.84, 29.84]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 553 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1716782690 --> 1716783316
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.18, 0.18, 0.18, 0.18, 0.18, 0.32, 0.32, 0.32, 0.32, 0.32, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.24, 0.24, 0.24, 0.24, 0.24, 0.21, 0.21, 0.21, 0.21, 0.21, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.07, 0.07, 0.07, 0.07, 0.07, 0.22, 0.22, 0.22, 0.22, 0.22, 0.15, 0.15, 0.15, 0.15, 0.15, 0.34, 0.34, 0.34, 0.34, 0.34, 0.45, 0.45, 0.45, 0.45, 0.45, 0.43, 0.43, 0.43, 0.43, 0.43, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.29, 0.29, 0.29, 0.29, 0.29, 0.35, 0.35, 0.35, 0.35, 0.35, 0.34, 0.34, 0.34, 0.34, 0.34, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.16, 0.16, 0.16, 0.16, 0.16, 0.08, 0.08, 0.08, 0.08, 0.08, 0.19, 0.19, 0.19, 0.19, 0.19, 0.13, 0.13, 0.13, 0.13, 0.13, 0.27, 0.27, 0.27, 0.27, 0.27, 0.22, 0.22, 0.22, 0.22, 0.22, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.24, 0.24, 0.24, 0.24, 0.24, 0.26, 0.26, 0.26, 0.26, 0.26, 0.21, 0.21, 0.21, 0.21, 0.21, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.08, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.36, 0.36, 0.36, 0.36, 0.36, 0.58, 0.58, 0.58, 0.58, 0.58, 0.59, 0.59, 0.59, 0.59, 0.59, 0.45, 0.45, 0.45, 0.45, 0.45, 0.27, 0.27, 0.27, 0.27, 0.27, 0.21, 0.21, 0.21, 0.21, 0.21, 0.27, 0.27, 0.27, 0.27, 0.27, 0.22, 0.22, 0.22, 0.22, 0.22, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.3, 0.3, 0.3, 0.3, 0.3, 0.12, 0.12, 0.12, 0.12, 0.12, 0.23, 0.23, 0.23, 0.23, 0.23, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 553 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1716782690 --> 1716783316
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0]

slaren · 2024-05-27T07:13:12Z

Should be fixed on the vulkan backend instead.

Adriankhl · 2024-05-27T07:59:27Z

Should be fixed on the vulkan backend instead.

Ok, changed it.

0cc4m

Thank you.

Adriankhl force-pushed the fix_vulkan_device branch from bd00902 to c61e8e9 Compare May 27, 2024 07:57

Adriankhl changed the title ~~ggml: initialize vulkan devices properly for LLAMA_SPLIT_MODE_NONE~~ vulkan: initialize devices properly for LLAMA_SPLIT_MODE_NONE May 27, 2024

Adriankhl force-pushed the fix_vulkan_device branch from c61e8e9 to ee84384 Compare May 27, 2024 07:59

github-actions bot added the Vulkan Issues specific to the Vulkan backend label May 27, 2024

mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix bugfix fixes an issue or bug labels May 27, 2024

slaren requested a review from 0cc4m May 27, 2024 12:04

vulkan: properly initialize vulkan devices for LLAMA_SPLIT_MODE_NONE

40c8d7f

Adriankhl force-pushed the fix_vulkan_device branch from ee84384 to 40c8d7f Compare May 28, 2024 00:56

Adriankhl mentioned this pull request May 28, 2024

vulkan: select only one device for single gpu with multiple drivers #7582

Merged

0cc4m approved these changes May 28, 2024

View reviewed changes

0cc4m merged commit 56411a9 into ggml-org:master May 28, 2024
65 checks passed

m0nsky mentioned this pull request Jun 19, 2024

Vulkan support SciSharp/LLamaSharp#797

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: initialize devices properly for LLAMA_SPLIT_MODE_NONE #7552

vulkan: initialize devices properly for LLAMA_SPLIT_MODE_NONE #7552

Uh oh!

Adriankhl commented May 27, 2024 •

edited

Loading

Uh oh!

github-actions bot commented May 27, 2024

Uh oh!

slaren commented May 27, 2024

Uh oh!

Adriankhl commented May 27, 2024

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Uh oh!

vulkan: initialize devices properly for LLAMA_SPLIT_MODE_NONE #7552

vulkan: initialize devices properly for LLAMA_SPLIT_MODE_NONE #7552

Uh oh!

Conversation

Adriankhl commented May 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 27, 2024

Uh oh!

slaren commented May 27, 2024

Uh oh!

Adriankhl commented May 27, 2024

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Adriankhl commented May 27, 2024 •

edited

Loading