Skip to content

Add slider for ubatch-size for llama.cpp loader in "See more options" #7309

@GodEmperor785

Description

@GodEmperor785

Description

Currently, for llama.cpp loader there is "See more options" sub-window on the model page. It has batch_size slider. But llama-server has also parameter called ubatch-size and setting it to higher value can have huge impact on performance of mixed GPU-CPU inference (prompt processing) for MoE models when used together with batch_size. Right now users can set ubatch-size via the extra-flags field (like "ubatch-size=XXXX"). It would be great to have ubatch-size as a slider like batch_size is done now, for ease of access OR set ubatch-size to the same value as batch_size slider.

For example, here are my results that show how big performance improvement is possible with changing this option:

  • batch_size slider set to 2048, no ubatch-size in extra-flags = around 170-180 tokens/s prompt processing
  • batch_size slider set to 2048, no "ubatch-size=2048" in extra-flags = 530-540 tokens/s prompt processing

This gives almost 3x performance gain.

Above results are for 32GB VRAM + 128GB RAM (DDR5) system with consumer i7 CPU. model is GLM-4.5-Air with Q4 quant (it should also work with 96GB RAM) and offloading most experts to CPU to fit important stuff in VRAM. Tests were made on multi-turn chat of length of almost 15k tokens.
I've seen other people in discord servers and on reddit claiming they saw similar performance gains when setting both batch_size and ubatch-size on slightly weaker systems (like RTX 3090 + 64GB DDR4 RAM - it still gave at least around 2x speedup)

Additional Context

As mentioned above, currently the workaround is to use "extra-flags" and just write the llama-server parameter "ubatch-size=XXXX"

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions