-
Notifications
You must be signed in to change notification settings - Fork 254
Bug: Garbled output with --split-mode layer on asymmetric multi-GPU setup (V100 32G+16G) #1500
Description
What happened?
Hello,
I am encountering an issue where llama-server produces garbled/invalid output when using --split-mode layer on a dual-GPU setup with asymmetric VRAM. However, using --split-mode graph works perfectly with the same hardware and model.
Environment:
OS: Linux (Ubuntu)
GPUs: 2x NVIDIA V100 SXM2
GPU 0: 16GB VRAM
GPU 1: 32GB VRAM
CUDA Version: 12.8
Model:
Path: /home/xx/models/qwen35_gguf/unsloth/Qwen3___5-35B-A3B-GGUF/Qwen3.5-35B-A3B-UD-Q6_K_XL.gguf
Type: GGUF (Q6_K_XL)
Steps to Reproduce:
Working Command (--split-mode graph):
Running the following command produces normal, coherent text output.
./llama-server -m "/home/xx/models/qwen35_gguf/unsloth/Qwen3___5-35B-A3B-GGUF/Qwen3.5-35B-A3B-UD-Q6_K_XL.gguf" --jinja -ngl 99 --threads 50 --ctx-size 32684 --temp 0.6 --min-p 0.0 --top-p 0.95 --top-k 20 --presence-penalty 1.0 --host 0.0.0.0 --split-mode graph
========================================================================
Failing Command (--split-mode layer):
Running the following command starts the server successfully, but the generated output is completely garbled (mojibake).
./llama-server -m "/home/xx/models/qwen35_gguf/unsloth/Qwen3___5-35B-A3B-GGUF/Qwen3.5-35B-A3B-UD-Q6_K_XL.gguf" --jinja -ngl 99 --threads 50 --ctx-size 32684 --temp 0.6 --min-p 0.0 --top-p 0.95 --top-k 20 --presence-penalty 1.0 --host 0.0.0.0 --split-mode layer
Actual Output (Garbled):
When using --split-mode layer, the response looks like this:
Additional Context:
The VRAM configuration is asymmetric (32GB + 16GB). I suspect the layer splitting logic might be miscalculating memory usage or tensor distribution across the uneven GPUs when --split-mode layer is selected.
The server does not crash; it simply generates invalid tokens.
-ngl 99 is used to offload all layers to GPU.
Thank you.
Name and Version
version: 4347 (233225d)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
No response