Bug: Garbled output with --split-mode layer on asymmetric multi-GPU setup (V100 32G+16G)

### What happened?

Hello,

I am encountering an issue where llama-server produces garbled/invalid output when using --split-mode layer on a dual-GPU setup with asymmetric VRAM. However, using --split-mode graph works perfectly with the same hardware and model.

Environment:


OS: Linux (Ubuntu)
GPUs: 2x NVIDIA V100 SXM2
GPU 0: 16GB VRAM
GPU 1: 32GB VRAM
CUDA Version: 12.8
Model:

Path: /home/xx/models/qwen35_gguf/unsloth/Qwen3___5-35B-A3B-GGUF/Qwen3.5-35B-A3B-UD-Q6_K_XL.gguf
Type: GGUF (Q6_K_XL)
Steps to Reproduce:

Working Command (--split-mode graph):
Running the following command produces normal, coherent text output.
<BASH>
./llama-server -m "/home/xx/models/qwen35_gguf/unsloth/Qwen3___5-35B-A3B-GGUF/Qwen3.5-35B-A3B-UD-Q6_K_XL.gguf" --jinja -ngl 99 --threads 50 --ctx-size 32684 --temp 0.6 --min-p 0.0 --top-p 0.95 --top-k 20 --presence-penalty 1.0 --host 0.0.0.0 --split-mode graph

========================================================================

Failing Command (--split-mode layer):
Running the following command starts the server successfully, but the generated output is completely garbled (mojibake).
<BASH>
./llama-server -m "/home/xx/models/qwen35_gguf/unsloth/Qwen3___5-35B-A3B-GGUF/Qwen3.5-35B-A3B-UD-Q6_K_XL.gguf" --jinja -ngl 99 --threads 50 --ctx-size 32684 --temp 0.6 --min-p 0.0 --top-p 0.95 --top-k 20 --presence-penalty 1.0 --host 0.0.0.0 --split-mode layer
Actual Output (Garbled):
When using --split-mode layer, the response looks like this:

<TEXT>
%#,&151&-".)4.35,2-2#,!*#+2'43%(13"#&)-20#*50,)*"%&'1#,%(&4+2%.#(5.-5!&'2-352+!32&'"2,05.&+3(1#
Expected Output:
Normal natural language text similar to what is produced when using --split-mode graph.

Additional Context:

The VRAM configuration is asymmetric (32GB + 16GB). I suspect the layer splitting logic might be miscalculating memory usage or tensor distribution across the uneven GPUs when --split-mode layer is selected.
The server does not crash; it simply generates invalid tokens.
-ngl 99 is used to offload all layers to GPU.

Thank you.

### Name and Version

version: 4347 (233225db)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 for x86_64-linux-gnu


### What operating system are you seeing the problem on?

_No response_

### Relevant log output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Garbled output with --split-mode layer on asymmetric multi-GPU setup (V100 32G+16G) #1500

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: Garbled output with --split-mode layer on asymmetric multi-GPU setup (V100 32G+16G) #1500

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions