You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add OpenAI-compatible remote embedding and reranking
Support offloading embedding and reranking to remote OpenAI-compatible
servers (vLLM, Ollama, LM Studio, OpenAI) while preserving local query
expansion and tokenization via a hybrid routing layer.
- RemoteLLM: HTTP client with circuit breaker, dimension validation,
batch splitting, auth headers, configurable timeouts
- HybridLLM: routes embed/rerank → remote, generate/expand → local
- LLM interface: add embedBatch, embedModelName; generalize singleton
and session management from LlamaCpp to LLM
- Config: QMD_EMBED_API_URL/MODEL env vars or YAML models section
- Skip nomic/Qwen3 text formatting prefixes for remote models
- 36 unit tests + 30 integration tests against live vLLM
Related: #489, #427, #446, #511
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+35Lines changed: 35 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -939,6 +939,41 @@ Uses node-llama-cpp's `createRankingContext()` and `rankAndSort()` API for cross
939
939
940
940
Used for generating query variations via `LlamaChatSession`.
941
941
942
+
### Remote Embedding & Reranking
943
+
944
+
QMD can offload embedding and reranking to a remote OpenAI-compatible server (vLLM, Ollama, LM Studio, OpenAI, etc.) while keeping query expansion local.
945
+
946
+
**Environment variables** (presence of `QMD_EMBED_API_URL` activates remote mode):
947
+
948
+
| Variable | Required | Description |
949
+
|----------|----------|-------------|
950
+
|`QMD_EMBED_API_URL`| Yes | Base URL, e.g. `http://gpu-host:8000/v1`|
951
+
|`QMD_EMBED_API_MODEL`| Yes | Model name, e.g. `BAAI/bge-m3`|
952
+
|`QMD_EMBED_API_KEY`| No | Bearer token for auth |
953
+
|`QMD_RERANK_API_URL`| No | Rerank endpoint (defaults to embed URL) |
954
+
|`QMD_RERANK_API_MODEL`| No | Rerank model name |
955
+
|`QMD_RERANK_API_KEY`| No | Rerank auth (defaults to embed key) |
0 commit comments