What happened?
I noticed after recent update, it is more frequent that my local LLM is doing more full prompt reprocessing when just continuing a conversation.
Here is what I saw from llama.cpp console:
155.00.988.881 W slot update_slots: id 0 | task 37846 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
I also notice that an auto-complete for next prompt feature is introduced recently. I wonder if this is related. Anyway, I want to know how to disable that feature even if it doesn't help. I don't need it and want to see if issue can be fixed by disabling it.
What did you expect to happen?
No full prompt reprocessing when just continuing a conversation.
Client information
Client Information
Run qwen to enter the interactive CLI, then run the /about command.
$ qwen /about
Qwen Code v0.18.5
Model: qwen3.6-27b
Fast Model: not set
Auth: openai
Platform: win32 x64 (10.0.26200)
Node.js: v22.22.2
Session: 738e8cbf-0d10-42a5-9b54-91310e77bbdf
Git commit: 2937b09cf
LSP: disabled
Login information
n/a
Anything else we need to know?
No response
What happened?
I noticed after recent update, it is more frequent that my local LLM is doing more full prompt reprocessing when just continuing a conversation.
Here is what I saw from llama.cpp console:
I also notice that an auto-complete for next prompt feature is introduced recently. I wonder if this is related. Anyway, I want to know how to disable that feature even if it doesn't help. I don't need it and want to see if issue can be fixed by disabling it.
What did you expect to happen?
No full prompt reprocessing when just continuing a conversation.
Client information
Client Information
Run
qwento enter the interactive CLI, then run the/aboutcommand.Login information
n/a
Anything else we need to know?
No response