Support for Activated LoRA (https://github.com/ggml-org/llama.cpp/issues/15212) #15213

kgreenewald · 2025-08-10T13:44:59Z

kgreenewald
Aug 10, 2025

Apologies if this is slightly out of order - I have created an issue #15212 requesting support for Activated LoRA adapters (see issue for details and motivation). These adapters are invoked by including an invocation sequence in the prompt, and only affect the weights for all tokens after the invocation sequence appears. This means that the adapter can re-use the KV cache from base model, leading to huge improvements in TTFT (compared to hot-swapping LoRA adapters) if you apply the adapter deep into a multi-turn interaction with the model. Appreciate any feedback or thoughts on this!

Our plan would be to start this integration work ourselves and submit a PR for this feature in the near future, building on the existing support for hot-swapping LoRA adapters.

This complements existing PRs to both Huggingface PEFT (huggingface/peft#2609) and vLLM (vllm-project/vllm#19710).

cc @gabe-l-hart

gabe-l-hart · 2025-09-08T14:38:04Z

gabe-l-hart
Sep 8, 2025
Collaborator

Initial support for aLoRA is now available starting with release b6396. For details, see the PR and related discussion: #15327

Next Steps

The current implementation focuses on enabling/disabling the adapter based on the presence of the invocation sequence by splitting the prefill batch and setting the scale to 0 for portions of the prefill where the adapter doesn't apply. While this works and provides a clean implementation, it can result in sub-optimal batching and therefore slower prefill. A more invasive implementation would be to update the single-value'ed adapter scale to support a sequence of scales (or possibly a full scale mask matrix) here. This would allow the scale to be applied on a token-by-token basis without disrupting the batching. This would have two key advantages:

Better batching performance
Library-level support for downstream projects that don't use llama-server.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for Activated LoRA (https://github.com/ggml-org/llama.cpp/issues/15212) #15213

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Support for Activated LoRA (https://github.com/ggml-org/llama.cpp/issues/15212) #15213

Uh oh!

kgreenewald Aug 10, 2025

Replies: 1 comment

Uh oh!

gabe-l-hart Sep 8, 2025 Collaborator

kgreenewald
Aug 10, 2025

gabe-l-hart
Sep 8, 2025
Collaborator