Skip to content

more full prompt reprocessing in recent update? #5736

Description

@fantasyz

What happened?

I noticed after recent update, it is more frequent that my local LLM is doing more full prompt reprocessing when just continuing a conversation.

Here is what I saw from llama.cpp console:

155.00.988.881 W slot update_slots: id  0 | task 37846 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)

I also notice that an auto-complete for next prompt feature is introduced recently. I wonder if this is related. Anyway, I want to know how to disable that feature even if it doesn't help. I don't need it and want to see if issue can be fixed by disabling it.

What did you expect to happen?

No full prompt reprocessing when just continuing a conversation.

Client information

Client Information

Run qwen to enter the interactive CLI, then run the /about command.

$ qwen /about
Qwen Code v0.18.5
Model: qwen3.6-27b
Fast Model: not set
Auth: openai
Platform: win32 x64 (10.0.26200)
Node.js: v22.22.2
Session: 738e8cbf-0d10-42a5-9b54-91310e77bbdf
Git commit: 2937b09cf
LSP: disabled

Login information

n/a

Anything else we need to know?

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions