What happened?
Self hosting Qwen3-Coder-480B-A35B-Instruct was a success, using something like this, but tool calls don't work:
Without a jinja template
lama-box --host 0.0.0.0 --embeddings --gpu-layers 63 --parallel 4 --ctx-size 8192 --port 40003 --model /mnt/nfs/models/Qwen3-Coder-480B-A35B-Instruct-Q4_K_M-00001-of-00006.gguf --alias Qwen3-Coder-480B-A35B-Instruct --no-mmap --no-warmup --tensor-split 182335,182335 --ctx-size 262144 --flash-attn --parallel 4 --mmap --mlock --verbose --top-k 20 --temp 0.7 --top-p 0.8 --repeat-penalty 1.05 --min-p 0.00
With a jinja template
lama-box --host 0.0.0.0 --embeddings --gpu-layers 63 --parallel 4 --ctx-size 8192 --port 40003 --model /mnt/nfs/models/Qwen3-Coder-480B-A35B-Instruct-Q4_K_M-00001-of-00006.gguf --alias Qwen3-Coder-480B-A35B-Instruct --no-mmap --no-warmup --tensor-split 182335,182335 --ctx-size 262144 --flash-attn --parallel 4 --mmap --mlock --verbose --top-k 20 --temp 0.7 --top-p 0.8 --repeat-penalty 1.05 --jinja --min-p 0.00 --chat-template-file /mnt/nfs/models/chat_template.jinja
Qwen Code appears to call tools, but it doesn't actually execute them. It just says something like [calling tool x with argument y and reasoning z].
What did you expect to happen?
I'd expect the tool calls to work.
Client information
Details
Login information
No response
Anything else we need to know?
I tried various quantizations and jinja files
What happened?
Self hosting Qwen3-Coder-480B-A35B-Instruct was a success, using something like this, but tool calls don't work:
Without a jinja template
With a jinja template
Qwen Code appears to call tools, but it doesn't actually execute them. It just says something like [calling tool x with argument y and reasoning z].
What did you expect to happen?
I'd expect the tool calls to work.
Client information
Details
Login information
No response
Anything else we need to know?
I tried various quantizations and jinja files