Skip to content

Eval bug: Failed to prepare attention ubatches -> crash #23803

Description

@ZisIsNotZis

Name and Version

version: 2696 (549b9d8)
built with GNU 13.3.0 for Linux x86_64

Operating systems

Linux

GGML backends

CUDA

Hardware

i7-6850K + 4090

Models

unsloth/Qwen3.6-35B-A3B-UD-IQ4_XS

Problem description & steps to reproduce

I'm running with llama-server -m ~/hf/Qwen3.6-35B-A3B-UD-IQ4_XS.gguf -c 262144 -ctk q5_1 -ctv q5_1 -ctkd q5_1 -ctvd q5_1 --temp .6 --top-p .8 --top-k 20 --presence-penalty 0 --min-p 0 --spec-type draft-mtp --spec-draft-n-max 1 --chat-template-kwargs '{"enable_thinking":false}'. After long time running, it crashes with failed to prepare ubatch. This is probably hard to reproduce?

First Bad Commit

Maybe has something to do with kv-quantization or MTP. No crash happened previously when I was using neither of them

Relevant log output

Logs
1482.46.827.375 I slot      release: id  0 | task 232083 | stop processing: n_tokens = 30946, truncated = 0
1482.46.938.685 I slot print_timing: id  3 | task 232040 | prompt eval time =     897.72 ms /   467 tokens (    1.92 ms per token,   520.21 tokens per second)
1482.46.938.692 I slot print_timing: id  3 | task 232040 |        eval time =   24572.14 ms /   189 tokens (  130.01 ms per token,     7.69 tokens per second)
1482.46.938.693 I slot print_timing: id  3 | task 232040 |       total time =   25469.85 ms /   656 tokens
1482.46.938.694 I slot print_timing: id  3 | task 232040 |    graphs reused =     224999
1482.46.938.696 I slot print_timing: id  3 | task 232040 | draft acceptance = 0.09942 (   17 accepted /   171 generated)
1482.46.938.719 I statistics        draft-mtp: #calls(b,g,a) =  587 227521 229853, #gen drafts = 229853, #acc drafts = 15459, #gen tokens = 229853, #acc tokens = 15459, dur(b,g,a) = 0.541, 475880.177, 224.089 ms
1482.46.940.473 I slot      release: id  3 | task 232040 | stop processing: n_tokens = 28639, truncated = 0
1482.47.635.662 I srv  params_from_: Chat format: peg-native
1482.47.700.063 I slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.996 (> 0.100 thold), f_keep = 1.000
1482.47.702.471 I reasoning-budget: activated, budget=2147483647 tokens
1482.47.702.475 I reasoning-budget: deactivated (natural end)
1482.47.702.606 I slot launch_slot_: id  0 | task 232227 | processing task, is_child = 0
1482.47.702.608 I slot slot_save_an: id  3 | task -1 | saving idle slot to prompt cache
1482.47.705.745 W srv   prompt_save:  - saving prompt with length 28639, total state size = 294.640 MiB (draft: 21.522 MiB)
1482.47.746.913 I srv  params_from_: Chat format: peg-native
1482.48.190.373 I slot prompt_clear: id  3 | task -1 | clearing prompt with 28639 tokens
1482.48.199.226 I srv        update:  - cache state: 2 prompts, 1152.091 MiB (limits: 8192.000 MiB, 262144 tokens, 290750 est)
1482.48.199.230 I srv        update:    - prompt 0x570dee03f930:   12251 tokens, checkpoints:  3,   374.292 MiB
1482.48.199.231 I srv        update:    - prompt 0x570def5c9810:   28639 tokens, checkpoints:  6,   777.800 MiB
1482.48.199.238 I slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = 285083052679
1482.48.199.239 I srv  get_availabl: updating prompt cache
1482.48.199.241 I srv          load:  - looking for better prompt, base f_keep = -1.000, sim = 0.000
1482.48.199.265 I srv          load:  - found better prompt with f_keep = 1.000, sim = 0.455
1482.51.174.710 I srv        update:  - cache state: 1 prompts, 374.292 MiB (limits: 8192.000 MiB, 262144 tokens, 268133 est)
1482.51.174.714 I srv        update:    - prompt 0x570dee03f930:   12251 tokens, checkpoints:  3,   374.292 MiB
1482.51.174.716 I srv  get_availabl: prompt cache update took 2975.48 ms
1482.51.176.866 I reasoning-budget: activated, budget=2147483647 tokens
1482.51.176.868 I reasoning-budget: deactivated (natural end)
1482.51.176.991 I slot launch_slot_: id  3 | task 232229 | processing task, is_child = 0
1482.51.182.122 W slot update_slots: id  0 | task 232227 | n_past = 30946, slot.prompt.tokens.size() = 30946, seq_id = 0, pos_min = 30945, n_swa = 0
1482.51.182.126 I slot update_slots: id  0 | task 232227 | Checking checkpoint with [30803, 30803] against 30945...
1482.51.489.586 W slot update_slots: id  0 | task 232227 | restored context checkpoint (pos_min = 30803, pos_max = 30803, n_tokens = 30804, n_past = 30804, size = 85.962 MiB)
1482.51.490.670 W slot update_slots: id  3 | task 232229 | n_past = 28639, slot.prompt.tokens.size() = 28639, seq_id = 3, pos_min = 28638, n_swa = 0
1482.51.490.672 I slot update_slots: id  3 | task 232229 | Checking checkpoint with [28446, 28446] against 28638...
1482.51.780.399 W slot update_slots: id  3 | task 232229 | restored context checkpoint (pos_min = 28446, pos_max = 28446, n_tokens = 28447, n_past = 28447, size = 84.191 MiB)
1482.53.305.403 I slot print_timing: id  1 | task 231735 | n_decoded =    511, tg =   6.19 t/s
1482.53.347.697 I slot create_check: id  0 | task 232227 | created context checkpoint 8 of 32 (pos_min = 31067, pos_max = 31067, n_tokens = 31068, size = 86.161 MiB)
1482.54.782.492 I slot print_timing: id  3 | task 232229 | prompt processing, n_tokens =   3820, progress = 0.51, t =   3.29 s / 1160.45 tokens per second
1482.56.230.309 I slot print_timing: id  3 | task 232229 | prompt processing, n_tokens =   5862, progress = 0.54, t =   4.74 s / 1236.80 tokens per second
1482.57.681.309 I slot print_timing: id  1 | task 231735 | n_decoded =    514, tg =   5.91 t/s
1482.57.692.369 I slot print_timing: id  3 | task 232229 | prompt processing, n_tokens =   7904, progress = 0.58, t =   6.20 s / 1274.49 tokens per second
1482.59.149.018 I slot print_timing: id  3 | task 232229 | prompt processing, n_tokens =   9946, progress = 0.61, t =   7.66 s / 1298.71 tokens per second
1482.59.149.906 I slot update_slots: id  3 | task 232229 | 8192 tokens since last checkpoint at 28447, creating new checkpoint during processing at position 40435
1482.59.186.028 I slot create_check: id  3 | task 232229 | created context checkpoint 7 of 32 (pos_min = 38392, pos_max = 38392, n_tokens = 38393, size = 91.665 MiB)
1483.00.665.714 I slot print_timing: id  3 | task 232229 | prompt processing, n_tokens =  11988, progress = 0.64, t =   9.18 s / 1306.59 tokens per second
1483.02.144.956 I slot print_timing: id  1 | task 231735 | n_decoded =    517, tg =   5.66 t/s
1483.02.155.904 I slot print_timing: id  3 | task 232229 | prompt processing, n_tokens =  14030, progress = 0.67, t =  10.67 s / 1315.49 tokens per second
1483.03.660.862 I slot print_timing: id  3 | task 232229 | prompt processing, n_tokens =  16072, progress = 0.71, t =  12.17 s / 1320.60 tokens per second
1483.05.173.565 I slot print_timing: id  1 | task 231735 | n_decoded =    519, tg =   5.50 t/s
1483.05.186.901 I slot print_timing: id  3 | task 232229 | prompt processing, n_tokens =  18114, progress = 0.74, t =  13.70 s / 1322.55 tokens per second
1483.05.191.289 E init_batch: failed to prepare attention ubatches
1483.05.191.308 W decode: failed to find a memory slot for batch of size 2048
1483.05.191.311 W srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 1024, ret = 1
1483.05.194.031 E init_batch: failed to prepare attention ubatches
1483.05.194.043 W decode: failed to find a memory slot for batch of size 1024
1483.05.194.045 W srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 512, ret = 1
1483.05.196.518 E init_batch: failed to prepare attention ubatches
1483.05.196.526 W decode: failed to find a memory slot for batch of size 512
1483.05.196.528 W srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 256, ret = 1
1483.05.198.950 E init_batch: failed to prepare attention ubatches
1483.05.198.960 W decode: failed to find a memory slot for batch of size 256
1483.05.198.962 W srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 128, ret = 1
1483.05.201.299 E init_batch: failed to prepare attention ubatches
1483.05.201.306 W decode: failed to find a memory slot for batch of size 128
1483.05.201.308 W srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 64, ret = 1
1483.05.203.609 E init_batch: failed to prepare attention ubatches
1483.05.203.617 W decode: failed to find a memory slot for batch of size 64
1483.05.203.619 W srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 32, ret = 1
1483.05.206.751 E init_batch: failed to prepare attention ubatches
1483.05.206.765 W decode: failed to find a memory slot for batch of size 32
1483.05.206.768 W srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 16, ret = 1
1483.05.209.109 E init_batch: failed to prepare attention ubatches
1483.05.209.118 W decode: failed to find a memory slot for batch of size 16
1483.05.209.119 W srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 8, ret = 1
1483.05.211.503 E init_batch: failed to prepare attention ubatches
1483.05.211.523 W decode: failed to find a memory slot for batch of size 8
1483.05.211.525 W srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 4, ret = 1
1483.05.214.941 E init_batch: failed to prepare attention ubatches
1483.05.214.953 W decode: failed to find a memory slot for batch of size 4
1483.05.214.956 W srv  update_slots: failed to find free space in the KV cache, retrying with smaller batch size, i = 0, n_batch = 2, ret = 1
/home/z/llama.cpp/common/sampling.cpp:154: GGML_ASSERT(logits != nullptr) failed
1483.05.247.093 E get_logits_ith: invalid logits id 2, reason: batch.logits[2] != true
[New LWP 1685059]
[New LWP 1685058]
[New LWP 1685057]
[New LWP 1685056]
[New LWP 1685055]
[New LWP 1685054]
[New LWP 1685053]
[New LWP 1685052]
[New LWP 1685051]
[New LWP 1685050]
[New LWP 1685049]
[New LWP 1685048]
[New LWP 1685047]
[New LWP 1685046]
[New LWP 1685045]
[New LWP 1685041]
[New LWP 1685040]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
warning: Could not load shared library symbols for 3 libraries, e.g. /usr/local/cuda-13.2/lib64/libcudart.so.13.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x000079c9e0310813 in __GI___wait4 (pid=1949300, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
warning: 30	../sysdeps/unix/sysv/linux/wait4.c: No such file or directory
#0  0x000079c9e0310813 in __GI___wait4 (pid=1949300, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x000079c9e055a073 in ggml_print_backtrace () from /home/z/llama.cpp/build/bin/libggml-base.so.0
#2  0x000079c9e055a223 in ggml_abort () from /home/z/llama.cpp/build/bin/libggml-base.so.0
#3  0x000079c9dfeb60bf in common_sampler_sample(common_sampler*, llama_context*, int, bool) () from /home/z/llama.cpp/build/bin/libllama-common.so.0
#4  0x000079c9dfeb62b0 in common_sampler_sample_and_accept_n(common_sampler*, llama_context*, std::vector<int, std::allocator<int> > const&, std::vector<int, std::allocator<int> > const&, bool) () from /home/z/llama.cpp/build/bin/libllama-common.so.0
#5  0x000079c9e0b72031 in server_context_impl::update_slots() () from /home/z/llama.cpp/build/bin/libllama-server-impl.so
#6  0x000079c9e0c03091 in server_queue::start_loop(long) () from /home/z/llama.cpp/build/bin/libllama-server-impl.so
#7  0x000079c9e0acc08b in llama_server(int, char**) () from /home/z/llama.cpp/build/bin/libllama-server-impl.so
#8  0x000079c9e022a1ca in __libc_start_call_main (main=main@entry=0x570db0407270, argc=argc@entry=29, argv=argv@entry=0x7ffe53142128) at ../sysdeps/nptl/libc_start_call_main.h:58
warning: 58	../sysdeps/nptl/libc_start_call_main.h: No such file or directory
#9  0x000079c9e022a28b in __libc_start_main_impl (main=0x570db0407270, argc=29, argv=0x7ffe53142128, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe53142118) at ../csu/libc-start.c:360
warning: 360	../csu/libc-start.c: No such file or directory
#10 0x0000570db04072a5 in ?? ()
[Inferior 1 (process 1685038) detached]
Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions