Eval bug: std::runtime_error Invalid diff: #13876

stargate426 · 2025-05-28T22:51:50Z

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 5519 (a682474)
built with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

Ryzen 5 3600 + RTX 5090

Models

Qwen3 32B q5

Problem description & steps to reproduce

./llama-server -m ~/llm/models/Qwen3-32B-Q5_K_S.gguf -c 16384 -ngl 999 --host 0.0.0.0 --port 5000 --jinja --api-key

This is how I run the program, the issue happens every so often and I can't (in the limited attempts I tried) replicate it with llama-cli

First Bad Commit

No response

Relevant log output

terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff: '<think>Okay, the user mentioned that Docker is taking up a lot of space and they want to delete unused volumes. Now they're saying that something else might be using all the storage and they don't know if it's Docker. I need to help them figure out what's consuming their disk space.

The text was updated successfully, but these errors were encountered:

n9Mtq4 · 2025-05-29T02:42:02Z

I'm getting the same thing. On unsloth's q8 of both qwen3 32b and 30b-a3b.

I've bisected it to e121edc from PR #13771.

One weird thing I noticed is that it may be using the wrong chat template:
on e121edc I see srv params_from_: Chat format: Hermes 2 in the log but one commit before on 2f099b5 I see srv params_from_: Chat format: Content-only.

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA RTX PRO 6000 Blackwell Workstation Edition, compute capability 12.0, VMM: yes
Device 1: NVIDIA RTX A6000, compute capability 8.6, VMM: yes
version: 5528 (53ae306)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu

Build command cmake -B build -DBUILD_SHARED_LIBS=ON -DLLAMA_CURL=OFF -DGGML_CUDA=ON -DGGML_CUDA_F16=ON -DGGML_CUDA_USE_GRAPHS=ON ; cmake --build build --config Release --parallel 32

I'm able to fairly consistently reproduce it with this prompt and this llama-server command

Evaluate
\[
\max_{\{x_1, x_2, x_3, x_4, x_5, x_6, x_7\} = \{1, 2, 3, 4, 5, 6, 7\}} \int_{\int_{x_1}^{x_2} x_3}^{\int_{x_4}^{x_5} x_6} x_7 \, dx
\]

llama-server
--port 9999
--jinja
--model /mnt/F8/ggufs/qwen3-32b/Qwen3-32B-Q8_0.gguf
--ctx-size 40960
--flash-attn
--slots
--samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"
--temp 0.6
--min-p 0.0
--top-k 20
--top-p 0.95
-ngl 99
-ngld 99
--no-mmap
--model-draft /mnt/F8/ggufs/qwen3-1.7b/Qwen3-1.7B-Q4_K_M.gguf
--draft-max 32
--draft-min 2
--draft-p-min 0.8
--device CUDA0
--device-draft CUDA0

teleprint-me · 2025-05-29T07:54:36Z

Have you tried commit 03f582a? Scratch that. It takes some time, but I'm able to reproduce it after taking quite a few turns. I don't get the diff error anymore though. Trying to see if these are related to iss #13877.

00:26:47 | ~/.bin/cpp/llama.cpp
 git:(master | θ) λ gdb --quiet -ex='break main' -ex=run --args llama-server --port 8080 --n-gpu-layers 99 --ctx-size 16384 --pooling mean --slots --jinja -fa 
-m /mnt/valerie/models/Qwen/Qwen3-1.7B/ggml-model-f16.gguf
# ...
prompt eval time =     503.52 ms /   425 tokens (    1.18 ms per token,   844.06 tokens per second)
       eval time =   10701.92 ms /   477 tokens (   22.44 ms per token,    44.57 tokens per second)
      total time =   11205.44 ms /   902 tokens
terminate called after throwing an instance of 'std::runtime_error'
  what():  </think>

Thread 1 "llama-server" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44	     return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff53af813 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:89
#2  0x00007ffff5355dc0 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff533d57a in __GI_abort () at abort.c:73
#4  0x00007ffff5697bf8 in __gnu_cxx::__verbose_terminate_handler () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#5  0x00007ffff56b1c1a in __cxxabiv1::__terminate (handler=<optimized out>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#6  0x00007ffff56975db in std::terminate () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#7  0x00007ffff56b1ed6 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x555555b1d260 <typeinfo for std::runtime_error@GLIBCXX_3.4>, dest=0x7ffff56c99b0 <std::runtime_error::~runtime_error()>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:98
#8  0x00005555558aabfa in common_chat_parse (input="<think>\nOkay, the user wants me to read the weather.py file. Let me think about how to approach this.\n\nFirst, I need to figure out the path structure. The user mentioned that the file is small and sel"..., is_partial=false, syntax=...)
    at /home/austin/.bin/cpp/llama.cpp/common/chat.cpp:1923
#9  0x0000555555647b2f in server_slot::update_chat_msg (this=0x555555f91e70, diffs=std::vector of length 0, capacity 0) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:1414
#10 0x00005555556532ff in server_context::send_final_response (this=0x7fffffffc110, slot=...) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:2521
#11 0x0000555555659056 in server_context::update_slots (this=0x7fffffffc110) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:3498
#12 0x00005555556003ff in operator() (__closure=0x7fffffffd6f0) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:4929
#13 0x000055555560e1e8 in std::__invoke_impl<void, main(int, char**)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/15.1.1/bits/invoke.h:63
#14 0x000055555560c446 in std::__invoke_r<void, main(int, char**)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/15.1.1/bits/invoke.h:113
#15 0x0000555555608520 in std::_Function_handler<void(), main(int, char**)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/15.1.1/bits/std_function.h:292
#16 0x000055555565fa98 in std::function<void()>::operator() (this=0x7fffffffd6f0) at /usr/include/c++/15.1.1/bits/std_function.h:593
#17 0x000055555564a4ff in server_queue::start_loop (this=0x7fffffffd5d0) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:1685
#18 0x0000555555602d6c in main (argc=14, argv=0x7fffffffd9a8) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:4954
(gdb) Quit
(gdb) quit
A debugging session is active.

	Inferior 1 [process 82107] will be killed.

Quit anyway? (y or n) y

Can you provide backtraces to reveal where the crash occurred? I think this might be related to chat and chat-parser in common.

n9Mtq4 · 2025-05-29T10:53:07Z

I get the same Invalid diff error on 03f582a. AFAIK I get the same error from e121edc..53ae306. I haven't tried anything post 53ae306 and I haven't checked every commit, but from the bisect, every commit in that range I did try had the issue.

Here's the trace on 03f582a.

0x00007ffff10a774c in ?? () from /usr/lib/libc.so.6
#0  0x00007ffff10a774c in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff104ddc0 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff103557a in abort () from /usr/lib/libc.so.6
#3  0x00007ffff1297bf8 in __gnu_cxx::__verbose_terminate_handler () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#4  0x00007ffff12b1c1a in __cxxabiv1::__terminate (handler=<optimized out>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#5  0x00007ffff12975db in std::terminate () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#6  0x00007ffff12b1ed6 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x555555843400 <typeinfo for std::runtime_error@GLIBCXX_3.4>, dest=0x7ffff12c99b0 <std::runtime_error::~runtime_error()>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:98
#7  0x000055555558c822 in string_diff(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [clone .cold] ()
#8  0x00005555556dc023 in common_chat_msg_diff::compute_diffs(common_chat_msg const&, common_chat_msg const&) ()
#9  0x000055555560bf53 in server_slot::update_chat_msg(std::vector<common_chat_msg_diff, std::allocator<common_chat_msg_diff> >&) ()
#10 0x000055555560c72f in server_context::send_partial_response(server_slot&, completion_token_output const&) ()
#11 0x000055555560d002 in server_context::process_token(completion_token_output&, server_slot&) ()
#12 0x000055555561d197 in server_context::update_slots() ()
#13 0x00005555555eaac5 in server_queue::start_loop() ()
#14 0x00005555555ab2b2 in main ()

teleprint-me · 2025-05-29T20:20:11Z

I think this is an issue with the chat parser.

Eval bug: Uncaught exception [json.exception.parse_error.101] during tool use crashes llama-server #13825
Eval bug: uncaught std::runtime_exception thrown in llama-server during tool use #13812
Eval bug: llama-server.exe silently crashes (ucrtbased.dll) after 2-3 requests in a dialogue #13877

These all are related to commits 03f582a..e121edc which was introduced in PR #12379 authored by @ochafik. PR #13786 was supposed to fix it, but I didn't have time to really test it and I'm noticing regressions once again.

All of these crashes point towards common/chat and common/chat-parser. PR #13786 weakened the diff so it wouldn't be so strict because this was the original issue with the original draft in PR #12379 - This is why I asked if you tried commit 03f582a specifically. That way, I can see if the regression is really resolved or not.

The bt here shows that computing the diff between the input and output streams yields anything and if it does, then it crashes because the input must match the previous history before commit 03f582a.

I'm wondering why it's popping up again after this commit which is reported by OP here in a682474. Just crossing the t's and dotting the i's, so-to-speak.

I have to work today, so I won't be able to dig into this in depth until later.

hronoas · 2025-05-30T13:53:31Z

Have the same bug. I tried many versions and found the last working one: b5486.
In b5488, which includes e121edc, the bug appears.

ochafik · 2025-05-30T23:05:45Z

Hi @stargate426 , thanks for reporting this!

Could you please provide a full --verbose log of when the issue happens? It would contain both the request to repro this, and the full error message (the Invalid diff: log above is incomplete)

@teleprint-me The exception w/ </think> is likely a different issue: #13812 (comment) (fix in progress: #13931)

steampunque · 2025-05-31T01:40:16Z

I also started seeing hard server invalid diff crashes at b5478 and above i.e. :

terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff:

b5477 was fine. I don't make use of the openai endpoint. It is crashing in a beam search inference algorithm I added to my downstream server. As I recall it was a lot of work to rebase my downstream server on b5478 so I can't rule out a mistake on my part but given that others are seeing this I think there is a good chance one of the routines changed/added to common in b5478 is the root cause of the problem.

n9Mtq4 · 2025-05-31T02:43:32Z

@ochafik here's a full log log.txt.gz

I also discovered that while downgrading to version: 5487 (2f099b5) fixes the issue with qwen3, I'm getting the same terminate called after throwing an instance of 'std::runtime_error' what(): Invalid diff: '... with DeepSeek-R1-0528-Qwen3-8B on that commit. So the bug might go back further.

stargate426 added the bug-unconfirmed label May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: std::runtime_error Invalid diff: #13876

Eval bug: std::runtime_error Invalid diff: #13876

stargate426 commented May 28, 2025

n9Mtq4 commented May 29, 2025

Uh oh!

teleprint-me commented May 29, 2025 •

edited

Loading

Uh oh!

n9Mtq4 commented May 29, 2025 •

edited

Loading

Uh oh!

teleprint-me commented May 29, 2025 •

edited

Loading

Uh oh!

hronoas commented May 30, 2025

Uh oh!

ochafik commented May 30, 2025

Uh oh!

steampunque commented May 31, 2025

Uh oh!

n9Mtq4 commented May 31, 2025

Uh oh!

Eval bug: std::runtime_error Invalid diff: #13876

Eval bug: std::runtime_error Invalid diff: #13876

Comments

stargate426 commented May 28, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

n9Mtq4 commented May 29, 2025

Uh oh!

teleprint-me commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

n9Mtq4 commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

teleprint-me commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hronoas commented May 30, 2025

Uh oh!

ochafik commented May 30, 2025

Uh oh!

steampunque commented May 31, 2025

Uh oh!

n9Mtq4 commented May 31, 2025

Uh oh!

teleprint-me commented May 29, 2025 •

edited

Loading

n9Mtq4 commented May 29, 2025 •

edited

Loading

teleprint-me commented May 29, 2025 •

edited

Loading