You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
vllm langchain: Add Document Retriever Support (#687)
* vllm langchain: Add Document Retriever Support
Include SearchedDoc in /v1/chat/completions endpoint to accept document
data retreived from retriever service to parse into LLM for answer
generation.
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* vllm: Update README documentation
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Yeoh, Hoong Tee <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
User can set the following model parameters according to needs:
198
+
199
+
- max_new_tokens: Total output token
200
+
- streaming(true/false): return text response in streaming mode or non-streaming mode
201
+
202
+
```bash
203
+
# 1. Non-streaming mode
185
204
curl http://${your_ip}:9000/v1/chat/completions \
186
205
-X POST \
187
206
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_p":0.95,"temperature":0.01,"streaming":false}' \
188
207
-H 'Content-Type: application/json'
208
+
209
+
# 2. Streaming mode
210
+
curl http://${your_ip}:9000/v1/chat/completions \
211
+
-X POST \
212
+
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \
213
+
-H 'Content-Type: application/json'
214
+
215
+
# 3. Custom chat template with streaming mode
216
+
curl http://${your_ip}:9000/v1/chat/completions \
217
+
-X POST \
218
+
-d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true, "chat_template":"### You are a helpful, respectful and honest assistant to help the user with questions.\n### Context: {context}\n### Question: {question}\n### Answer:"}' \
219
+
-H 'Content-Type: application/json'
220
+
221
+
4. # Chat with SearchedDoc (Retrieval context)
222
+
curl http://${your_ip}:9000/v1/chat/completions \
223
+
-X POST \
224
+
-d '{"initial_query":"What is Deep Learning?","retrieved_docs":[{"text":"Deep Learning is a ..."},{"text":"Deep Learning is b ..."}]}' \
0 commit comments