-
Notifications
You must be signed in to change notification settings - Fork 14.4k
Closed
Labels
Description
Name and Version
$ llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3070, compute capability 8.6, VMM: yes
version: 7149 (134e6940c)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnuOperating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-cli
Command line
llama-cli -m ./qwen2.5-1.5b-instruct.gguf --lora ./qwen-lora.ggufProblem description & steps to reproduce
I noticed this issue when I was using the --lora switch for running a fine-tuning LoRA along with a base model. The LoRA didn't seem to take effect, that is, asking the model questions with it enabled rendered the same results as not using the LoRA at all. If I use --jinja however, the fine-tuning LoRA did take effect.
The documentation for llama-cli indicates that --jinja is the default behaviour. However the following pull request suggests that --jinja is only enabled by default when running in server mode, not cli:
If I can figure out the auto-gen documentation thing I'll also submit a pull request to correct the -cli README
First Bad Commit
No response