Summary
QMD currently only supports local GGUF models via node-llama-cpp for embeddings. It would be great to support external API-based embedding providers (OpenAI-compatible, Ollama, etc.) as an alternative.
Use Case
Many users already have powerful embedding models running externally:
- Ollama hosting bge-m3 (1024-dim) on a NAS/server
- OpenAI text-embedding-3-small/large
- Self-hosted TEI (Text Embeddings Inference) endpoints
These often provide better embedding quality than the local 300M/0.6B GGUF models, especially for non-English content (CJK languages).
Current Workaround
Users who need high-quality embeddings + reranking currently cannot use QMD because:
- Local embeddinggemma-300M has limited CJK coverage
- Qwen3-Embedding-0.6B is better but still weaker than bge-m3 (1024-dim)
- No way to point QMD at an external embedding API
Proposed Solution
Add an environment variable or config option like:
# Use Ollama OpenAI-compatible API for embeddings
export QMD_EMBED_PROVIDER="openai"
export QMD_EMBED_API_URL="http://192.168.1.13:11434/v1"
export QMD_EMBED_MODEL="bge-m3"
export QMD_EMBED_DIMENSIONS=1024
This would allow QMD to call external embedding APIs while keeping reranking and query expansion local (which is a great design choice).
Environment
- QMD version: 2.1.0
- OS: Ubuntu 24.04 (x64)
- Node: 22.22.0
Thanks for the great tool! The hybrid search + query expansion + reranking pipeline is excellent.
Summary
QMD currently only supports local GGUF models via node-llama-cpp for embeddings. It would be great to support external API-based embedding providers (OpenAI-compatible, Ollama, etc.) as an alternative.
Use Case
Many users already have powerful embedding models running externally:
These often provide better embedding quality than the local 300M/0.6B GGUF models, especially for non-English content (CJK languages).
Current Workaround
Users who need high-quality embeddings + reranking currently cannot use QMD because:
Proposed Solution
Add an environment variable or config option like:
This would allow QMD to call external embedding APIs while keeping reranking and query expansion local (which is a great design choice).
Environment
Thanks for the great tool! The hybrid search + query expansion + reranking pipeline is excellent.