Skip to content

Eval bug: std::runtime_error Invalid diff: #13876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stargate426 opened this issue May 28, 2025 · 8 comments
Open

Eval bug: std::runtime_error Invalid diff: #13876

stargate426 opened this issue May 28, 2025 · 8 comments

Comments

@stargate426
Copy link

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 5519 (a682474)
built with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

Ryzen 5 3600 + RTX 5090

Models

Qwen3 32B q5

Problem description & steps to reproduce

./llama-server -m ~/llm/models/Qwen3-32B-Q5_K_S.gguf -c 16384 -ngl 999 --host 0.0.0.0 --port 5000 --jinja --api-key

This is how I run the program, the issue happens every so often and I can't (in the limited attempts I tried) replicate it with llama-cli

First Bad Commit

No response

Relevant log output

terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff: '<think>Okay, the user mentioned that Docker is taking up a lot of space and they want to delete unused volumes. Now they're saying that something else might be using all the storage and they don't know if it's Docker. I need to help them figure out what's consuming their disk space.
@n9Mtq4
Copy link

n9Mtq4 commented May 29, 2025

I'm getting the same thing. On unsloth's q8 of both qwen3 32b and 30b-a3b.

I've bisected it to e121edc from PR #13771.

One weird thing I noticed is that it may be using the wrong chat template:
on e121edc I see srv params_from_: Chat format: Hermes 2 in the log but one commit before on 2f099b5 I see srv params_from_: Chat format: Content-only.

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA RTX PRO 6000 Blackwell Workstation Edition, compute capability 12.0, VMM: yes
Device 1: NVIDIA RTX A6000, compute capability 8.6, VMM: yes
version: 5528 (53ae306)
built with cc (GCC) 15.1.1 20250425 for x86_64-pc-linux-gnu

Build command cmake -B build -DBUILD_SHARED_LIBS=ON -DLLAMA_CURL=OFF -DGGML_CUDA=ON -DGGML_CUDA_F16=ON -DGGML_CUDA_USE_GRAPHS=ON ; cmake --build build --config Release --parallel 32

I'm able to fairly consistently reproduce it with this prompt and this llama-server command

Evaluate
\[
\max_{\{x_1, x_2, x_3, x_4, x_5, x_6, x_7\} = \{1, 2, 3, 4, 5, 6, 7\}} \int_{\int_{x_1}^{x_2} x_3}^{\int_{x_4}^{x_5} x_6} x_7 \, dx
\]
llama-server
--port 9999
--jinja
--model /mnt/F8/ggufs/qwen3-32b/Qwen3-32B-Q8_0.gguf
--ctx-size 40960
--flash-attn
--slots
--samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"
--temp 0.6
--min-p 0.0
--top-k 20
--top-p 0.95
-ngl 99
-ngld 99
--no-mmap
--model-draft /mnt/F8/ggufs/qwen3-1.7b/Qwen3-1.7B-Q4_K_M.gguf
--draft-max 32
--draft-min 2
--draft-p-min 0.8
--device CUDA0
--device-draft CUDA0

@teleprint-me
Copy link
Contributor

teleprint-me commented May 29, 2025

Have you tried commit 03f582a? Scratch that. It takes some time, but I'm able to reproduce it after taking quite a few turns. I don't get the diff error anymore though. Trying to see if these are related to iss #13877.

00:26:47 | ~/.bin/cpp/llama.cpp
 git:(master | θ) λ gdb --quiet -ex='break main' -ex=run --args llama-server --port 8080 --n-gpu-layers 99 --ctx-size 16384 --pooling mean --slots --jinja -fa 
-m /mnt/valerie/models/Qwen/Qwen3-1.7B/ggml-model-f16.gguf
# ...
prompt eval time =     503.52 ms /   425 tokens (    1.18 ms per token,   844.06 tokens per second)
       eval time =   10701.92 ms /   477 tokens (   22.44 ms per token,    44.57 tokens per second)
      total time =   11205.44 ms /   902 tokens
terminate called after throwing an instance of 'std::runtime_error'
  what():  </think>

Thread 1 "llama-server" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44	     return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff53af813 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:89
#2  0x00007ffff5355dc0 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff533d57a in __GI_abort () at abort.c:73
#4  0x00007ffff5697bf8 in __gnu_cxx::__verbose_terminate_handler () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#5  0x00007ffff56b1c1a in __cxxabiv1::__terminate (handler=<optimized out>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#6  0x00007ffff56975db in std::terminate () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#7  0x00007ffff56b1ed6 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x555555b1d260 <typeinfo for std::runtime_error@GLIBCXX_3.4>, dest=0x7ffff56c99b0 <std::runtime_error::~runtime_error()>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:98
#8  0x00005555558aabfa in common_chat_parse (input="<think>\nOkay, the user wants me to read the weather.py file. Let me think about how to approach this.\n\nFirst, I need to figure out the path structure. The user mentioned that the file is small and sel"..., is_partial=false, syntax=...)
    at /home/austin/.bin/cpp/llama.cpp/common/chat.cpp:1923
#9  0x0000555555647b2f in server_slot::update_chat_msg (this=0x555555f91e70, diffs=std::vector of length 0, capacity 0) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:1414
#10 0x00005555556532ff in server_context::send_final_response (this=0x7fffffffc110, slot=...) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:2521
#11 0x0000555555659056 in server_context::update_slots (this=0x7fffffffc110) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:3498
#12 0x00005555556003ff in operator() (__closure=0x7fffffffd6f0) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:4929
#13 0x000055555560e1e8 in std::__invoke_impl<void, main(int, char**)::<lambda()>&>(std::__invoke_other, struct {...} &) (__f=...) at /usr/include/c++/15.1.1/bits/invoke.h:63
#14 0x000055555560c446 in std::__invoke_r<void, main(int, char**)::<lambda()>&>(struct {...} &) (__fn=...) at /usr/include/c++/15.1.1/bits/invoke.h:113
#15 0x0000555555608520 in std::_Function_handler<void(), main(int, char**)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/15.1.1/bits/std_function.h:292
#16 0x000055555565fa98 in std::function<void()>::operator() (this=0x7fffffffd6f0) at /usr/include/c++/15.1.1/bits/std_function.h:593
#17 0x000055555564a4ff in server_queue::start_loop (this=0x7fffffffd5d0) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:1685
#18 0x0000555555602d6c in main (argc=14, argv=0x7fffffffd9a8) at /home/austin/.bin/cpp/llama.cpp/tools/server/server.cpp:4954
(gdb) Quit
(gdb) quit
A debugging session is active.

	Inferior 1 [process 82107] will be killed.

Quit anyway? (y or n) y

Can you provide backtraces to reveal where the crash occurred? I think this might be related to chat and chat-parser in common.

@n9Mtq4
Copy link

n9Mtq4 commented May 29, 2025

I get the same Invalid diff error on 03f582a. AFAIK I get the same error from e121edc..53ae306. I haven't tried anything post 53ae306 and I haven't checked every commit, but from the bisect, every commit in that range I did try had the issue.

Here's the trace on 03f582a.

0x00007ffff10a774c in ?? () from /usr/lib/libc.so.6
#0  0x00007ffff10a774c in ?? () from /usr/lib/libc.so.6
#1  0x00007ffff104ddc0 in raise () from /usr/lib/libc.so.6
#2  0x00007ffff103557a in abort () from /usr/lib/libc.so.6
#3  0x00007ffff1297bf8 in __gnu_cxx::__verbose_terminate_handler () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/vterminate.cc:95
#4  0x00007ffff12b1c1a in __cxxabiv1::__terminate (handler=<optimized out>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:48
#5  0x00007ffff12975db in std::terminate () at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:58
#6  0x00007ffff12b1ed6 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x555555843400 <typeinfo for std::runtime_error@GLIBCXX_3.4>, dest=0x7ffff12c99b0 <std::runtime_error::~runtime_error()>) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/eh_throw.cc:98
#7  0x000055555558c822 in string_diff(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [clone .cold] ()
#8  0x00005555556dc023 in common_chat_msg_diff::compute_diffs(common_chat_msg const&, common_chat_msg const&) ()
#9  0x000055555560bf53 in server_slot::update_chat_msg(std::vector<common_chat_msg_diff, std::allocator<common_chat_msg_diff> >&) ()
#10 0x000055555560c72f in server_context::send_partial_response(server_slot&, completion_token_output const&) ()
#11 0x000055555560d002 in server_context::process_token(completion_token_output&, server_slot&) ()
#12 0x000055555561d197 in server_context::update_slots() ()
#13 0x00005555555eaac5 in server_queue::start_loop() ()
#14 0x00005555555ab2b2 in main ()

@teleprint-me
Copy link
Contributor

teleprint-me commented May 29, 2025

I think this is an issue with the chat parser.

These all are related to commits 03f582a..e121edc which was introduced in PR #12379 authored by @ochafik. PR #13786 was supposed to fix it, but I didn't have time to really test it and I'm noticing regressions once again.

All of these crashes point towards common/chat and common/chat-parser. PR #13786 weakened the diff so it wouldn't be so strict because this was the original issue with the original draft in PR #12379 - This is why I asked if you tried commit 03f582a specifically. That way, I can see if the regression is really resolved or not.

The bt here shows that computing the diff between the input and output streams yields anything and if it does, then it crashes because the input must match the previous history before commit 03f582a.

I'm wondering why it's popping up again after this commit which is reported by OP here in a682474. Just crossing the t's and dotting the i's, so-to-speak.

I have to work today, so I won't be able to dig into this in depth until later.

@hronoas
Copy link

hronoas commented May 30, 2025

Have the same bug. I tried many versions and found the last working one: b5486.
In b5488, which includes e121edc, the bug appears.

@ochafik
Copy link
Collaborator

ochafik commented May 30, 2025

Hi @stargate426 , thanks for reporting this!

Could you please provide a full --verbose log of when the issue happens? It would contain both the request to repro this, and the full error message (the Invalid diff: log above is incomplete)

@teleprint-me The exception w/ </think> is likely a different issue: #13812 (comment) (fix in progress: #13931)

@steampunque
Copy link

I also started seeing hard server invalid diff crashes at b5478 and above i.e. :

terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid diff:

b5477 was fine. I don't make use of the openai endpoint. It is crashing in a beam search inference algorithm I added to my downstream server. As I recall it was a lot of work to rebase my downstream server on b5478 so I can't rule out a mistake on my part but given that others are seeing this I think there is a good chance one of the routines changed/added to common in b5478 is the root cause of the problem.

@n9Mtq4
Copy link

n9Mtq4 commented May 31, 2025

@ochafik here's a full log log.txt.gz

I also discovered that while downgrading to version: 5487 (2f099b5) fixes the issue with qwen3, I'm getting the same terminate called after throwing an instance of 'std::runtime_error' what(): Invalid diff: '... with DeepSeek-R1-0528-Qwen3-8B on that commit. So the bug might go back further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants