Add slider for ubatch-size for llama.cpp loader in "See more options"

**Description**

Currently, for llama.cpp loader there is "See more options" sub-window on the model page. It has batch_size slider. But llama-server has also parameter called ubatch-size and setting it to higher value can have huge impact on performance of mixed GPU-CPU inference (prompt processing) for MoE models when used together with batch_size. Right now users can set ubatch-size via the extra-flags field (like "ubatch-size=XXXX"). It would be great to have ubatch-size as a slider like batch_size is done now, for ease of access OR set ubatch-size to the same value as batch_size slider.

For example, here are my results that show how big performance improvement is possible with changing this option:
- batch_size slider set to 2048, no ubatch-size in extra-flags = around 170-180 tokens/s prompt processing
- batch_size slider set to 2048, no "ubatch-size=2048" in extra-flags = 530-540 tokens/s prompt processing

This gives almost 3x performance gain.

Above results are for 32GB VRAM + 128GB RAM (DDR5) system with consumer i7 CPU. model is GLM-4.5-Air with Q4 quant (it should also work with 96GB RAM) and offloading most experts to CPU to fit important stuff in VRAM. Tests were made on multi-turn chat of length of almost 15k tokens.
I've seen other people in discord servers and on reddit claiming they saw similar performance gains when setting both batch_size and ubatch-size on slightly weaker systems (like RTX 3090 + 64GB DDR4 RAM - it still gave at least around 2x speedup)

**Additional Context**

As mentioned above, currently the workaround is to use "extra-flags" and just write the llama-server parameter "ubatch-size=XXXX"


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add slider for ubatch-size for llama.cpp loader in "See more options" #7309

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add slider for ubatch-size for llama.cpp loader in "See more options" #7309

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions