Skip to content

Commit 631d841

Browse files
Support vLLM/vLLM-on-Ray/Ray Serve for ChatQnA (#428)
* support vllm for chatqna Signed-off-by: Xinyao Wang <[email protected]> * add vllm-on-ray into ChatQnA Signed-off-by: Xinyao Wang <[email protected]> * support ray serve in ChatQnA Signed-off-by: Xinyao Wang <[email protected]> * fix conflice Signed-off-by: Xinyao Wang <[email protected]> * refine readme Signed-off-by: Xinyao Wang <[email protected]> * add UT for chatqna vllm Signed-off-by: Xinyao Wang <[email protected]> * add UT for ChatQnA Ray Serve Signed-off-by: Xinyao Wang <[email protected]> * add UT for chatqna vllm ray Signed-off-by: Xinyao Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add vllm for chatqna on xeon Signed-off-by: Xinyao Wang <[email protected]> * fix bug for vllm chatqna cpu Signed-off-by: Xinyao Wang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add ut for chatqna vllm Signed-off-by: Xinyao Wang <[email protected]> --------- Signed-off-by: Xinyao Wang <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 665c46f commit 631d841

10 files changed

+1949
-5
lines changed

ChatQnA/docker/gaudi/README.md

Lines changed: 103 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,56 @@ docker build --no-cache -t opea/reranking-tei:latest --build-arg https_proxy=$ht
3333

3434
### 5. Build LLM Image
3535

36+
You can use different LLM serving solutions, choose one of following four options.
37+
38+
#### 5.1 Use TGI
39+
3640
```bash
3741
docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile .
3842
```
3943

44+
#### 5.2 Use VLLM
45+
46+
Build vllm docker.
47+
48+
```bash
49+
docker build --no-cache -t vllm:hpu --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/docker/Dockerfile.hpu .
50+
```
51+
52+
Build microservice docker.
53+
54+
```bash
55+
docker build --no-cache -t opea/llm-vllm:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm/docker/Dockerfile.microservice .
56+
```
57+
58+
#### 5.3 Use VLLM-on-Ray
59+
60+
Build vllm-on-ray docker.
61+
62+
```bash
63+
docker build --no-cache -t vllm_ray:habana --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm-ray/docker/Dockerfile.vllmray .
64+
```
65+
66+
Build microservice docker.
67+
68+
```bash
69+
docker build --no-cache -t opea/llm-vllm-ray:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/vllm-ray/docker/Dockerfile.microservice .
70+
```
71+
72+
#### 5.4 Use Ray Serve
73+
74+
Build Ray Serve docker.
75+
76+
```bash
77+
docker build --no-cache -t ray_serve:habana --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/ray_serve/docker/Dockerfile.rayserve .
78+
```
79+
80+
Build microservice docker.
81+
82+
```bash
83+
docker build --no-cache -t opea/llm-ray:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/ray_serve/docker/Dockerfile.microservice .
84+
```
85+
4086
### 6. Build Dataprep Image
4187

4288
```bash
@@ -113,7 +159,7 @@ Then run the command `docker images`, you will have the following 8 Docker Image
113159
1. `opea/embedding-tei:latest`
114160
2. `opea/retriever-redis:latest`
115161
3. `opea/reranking-tei:latest`
116-
4. `opea/llm-tgi:latest`
162+
4. `opea/llm-tgi:latest` or `opea/llm-vllm:latest` or `opea/llm-vllm-ray:latest` or `opea/llm-ray:latest`
117163
5. `opea/tei-gaudi:latest`
118164
6. `opea/dataprep-redis:latest`
119165
7. `opea/chatqna:latest` or `opea/chatqna-guardrails:latest`
@@ -140,9 +186,14 @@ export https_proxy=${your_http_proxy}
140186
export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
141187
export RERANK_MODEL_ID="BAAI/bge-reranker-base"
142188
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
189+
export LLM_MODEL_ID_NAME="neural-chat-7b-v3-3"
143190
export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:8090"
144191
export TEI_RERANKING_ENDPOINT="http://${host_ip}:8808"
145192
export TGI_LLM_ENDPOINT="http://${host_ip}:8008"
193+
export vLLM_LLM_ENDPOINT="http://${host_ip}:8008"
194+
export vLLM_RAY_LLM_ENDPOINT="http://${host_ip}:8008"
195+
export RAY_Serve_LLM_ENDPOINT="http://${host_ip}:8008"
196+
export LLM_SERVICE_PORT=9000
146197
export REDIS_URL="redis://${host_ip}:6379"
147198
export INDEX_NAME="rag-redis"
148199
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
@@ -171,9 +222,32 @@ Note: Please replace with `host_ip` with you external IP address, do **NOT** use
171222

172223
```bash
173224
cd GenAIExamples/ChatQnA/docker/gaudi/
225+
```
226+
227+
If use tgi for llm backend.
228+
229+
```bash
174230
docker compose -f docker_compose.yaml up -d
175231
```
176232

233+
If use vllm for llm backend.
234+
235+
```bash
236+
docker compose -f docker_compose_vllm.yaml up -d
237+
```
238+
239+
If use vllm-on-ray for llm backend.
240+
241+
```bash
242+
docker compose -f docker_compose_vllm_ray.yaml up -d
243+
```
244+
245+
If use ray serve for llm backend.
246+
247+
```bash
248+
docker compose -f docker_compose_ray_serve.yaml up -d
249+
```
250+
177251
If you want to enable guardrails microservice in the pipeline, please follow the below command instead:
178252

179253
```bash
@@ -238,15 +312,42 @@ curl http://${host_ip}:8000/v1/reranking \
238312
-H 'Content-Type: application/json'
239313
```
240314

241-
6. TGI Service
315+
6. LLM backend Service
242316

243317
```bash
318+
#TGI Service
244319
curl http://${host_ip}:8008/generate \
245320
-X POST \
246321
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":64, "do_sample": true}}' \
247322
-H 'Content-Type: application/json'
248323
```
249324

325+
```bash
326+
#vLLM Service
327+
curl http://${your_ip}:8008/v1/completions \
328+
-H "Content-Type: application/json" \
329+
-d '{
330+
"model": "${LLM_MODEL_ID}",
331+
"prompt": "What is Deep Learning?",
332+
"max_tokens": 32,
333+
"temperature": 0
334+
}'
335+
```
336+
337+
```bash
338+
#vLLM-on-Ray Service
339+
curl http://${your_ip}:8008/v1/chat/completions \
340+
-H "Content-Type: application/json" \
341+
-d '{"model": "${LLM_MODEL_ID}", "messages": [{"role": "user", "content": "What is Deep Learning?"}]}'
342+
```
343+
344+
```bash
345+
#Ray Serve Service
346+
curl http://${your_ip}:8008/v1/chat/completions \
347+
-H "Content-Type: application/json" \
348+
-d '{"model": "${LLM_MODEL_ID_NAME}", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens": 32 }'
349+
```
350+
250351
7. LLM Microservice
251352

252353
```bash
Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
2+
# Copyright (C) 2024 Intel Corporation
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
version: "3.8"
6+
7+
services:
8+
redis-vector-db:
9+
image: redis/redis-stack:7.2.0-v9
10+
container_name: redis-vector-db
11+
ports:
12+
- "6379:6379"
13+
- "8001:8001"
14+
dataprep-redis-service:
15+
image: opea/dataprep-redis:latest
16+
container_name: dataprep-redis-server
17+
depends_on:
18+
- redis-vector-db
19+
ports:
20+
- "6007:6007"
21+
- "6008:6008"
22+
- "6009:6009"
23+
environment:
24+
no_proxy: ${no_proxy}
25+
http_proxy: ${http_proxy}
26+
https_proxy: ${https_proxy}
27+
REDIS_URL: ${REDIS_URL}
28+
INDEX_NAME: ${INDEX_NAME}
29+
tei-embedding-service:
30+
image: opea/tei-gaudi:latest
31+
container_name: tei-embedding-gaudi-server
32+
ports:
33+
- "8090:80"
34+
volumes:
35+
- "./data:/data"
36+
runtime: habana
37+
cap_add:
38+
- SYS_NICE
39+
ipc: host
40+
environment:
41+
no_proxy: ${no_proxy}
42+
http_proxy: ${http_proxy}
43+
https_proxy: ${https_proxy}
44+
HABANA_VISIBLE_DEVICES: all
45+
OMPI_MCA_btl_vader_single_copy_mechanism: none
46+
MAX_WARMUP_SEQUENCE_LENGTH: 512
47+
command: --model-id ${EMBEDDING_MODEL_ID}
48+
embedding:
49+
image: opea/embedding-tei:latest
50+
container_name: embedding-tei-server
51+
depends_on:
52+
- tei-embedding-service
53+
ports:
54+
- "6000:6000"
55+
ipc: host
56+
environment:
57+
no_proxy: ${no_proxy}
58+
http_proxy: ${http_proxy}
59+
https_proxy: ${https_proxy}
60+
TEI_EMBEDDING_ENDPOINT: ${TEI_EMBEDDING_ENDPOINT}
61+
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
62+
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
63+
LANGCHAIN_PROJECT: "opea-embedding-service"
64+
restart: unless-stopped
65+
retriever:
66+
image: opea/retriever-redis:latest
67+
container_name: retriever-redis-server
68+
depends_on:
69+
- redis-vector-db
70+
ports:
71+
- "7000:7000"
72+
ipc: host
73+
environment:
74+
no_proxy: ${no_proxy}
75+
http_proxy: ${http_proxy}
76+
https_proxy: ${https_proxy}
77+
REDIS_URL: ${REDIS_URL}
78+
INDEX_NAME: ${INDEX_NAME}
79+
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
80+
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
81+
LANGCHAIN_PROJECT: "opea-retriever-service"
82+
restart: unless-stopped
83+
tei-reranking-service:
84+
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
85+
container_name: tei-reranking-gaudi-server
86+
ports:
87+
- "8808:80"
88+
volumes:
89+
- "./data:/data"
90+
shm_size: 1g
91+
environment:
92+
no_proxy: ${no_proxy}
93+
http_proxy: ${http_proxy}
94+
https_proxy: ${https_proxy}
95+
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
96+
HF_HUB_DISABLE_PROGRESS_BARS: 1
97+
HF_HUB_ENABLE_HF_TRANSFER: 0
98+
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
99+
reranking:
100+
image: opea/reranking-tei:latest
101+
container_name: reranking-tei-gaudi-server
102+
depends_on:
103+
- tei-reranking-service
104+
ports:
105+
- "8000:8000"
106+
ipc: host
107+
environment:
108+
no_proxy: ${no_proxy}
109+
http_proxy: ${http_proxy}
110+
https_proxy: ${https_proxy}
111+
TEI_RERANKING_ENDPOINT: ${TEI_RERANKING_ENDPOINT}
112+
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
113+
HF_HUB_DISABLE_PROGRESS_BARS: 1
114+
HF_HUB_ENABLE_HF_TRANSFER: 0
115+
LANGCHAIN_API_KEY: ${LANGCHAIN_API_KEY}
116+
LANGCHAIN_TRACING_V2: ${LANGCHAIN_TRACING_V2}
117+
LANGCHAIN_PROJECT: "opea-reranking-service"
118+
restart: unless-stopped
119+
ray-service:
120+
image: ray_serve:habana
121+
container_name: ray-gaudi-server
122+
ports:
123+
- "8008:80"
124+
volumes:
125+
- "./data:/data"
126+
environment:
127+
no_proxy: ${no_proxy}
128+
http_proxy: ${http_proxy}
129+
https_proxy: ${https_proxy}
130+
HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
131+
HABANA_VISIBLE_DEVICES: all
132+
OMPI_MCA_btl_vader_single_copy_mechanism: none
133+
LLM_MODEL: ${LLM_MODEL_ID}
134+
TRUST_REMOTE_CODE: True
135+
runtime: habana
136+
cap_add:
137+
- SYS_NICE
138+
ipc: host
139+
command: /bin/bash -c "ray start --head && python api_server_openai.py --port_number 80 --model_id_or_path $LLM_MODEL --chat_processor ChatModelLlama --num_cpus_per_worker 8 --num_hpus_per_worker 1"
140+
llm:
141+
image: opea/llm-ray:latest
142+
container_name: llm-ray-gaudi-server
143+
depends_on:
144+
- ray-service
145+
ports:
146+
- "9000:9000"
147+
ipc: host
148+
environment:
149+
no_proxy: ${no_proxy}
150+
http_proxy: ${http_proxy}
151+
https_proxy: ${https_proxy}
152+
RAY_Serve_ENDPOINT: ${RAY_Serve_LLM_ENDPOINT}
153+
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
154+
LLM_MODEL: ${LLM_MODEL_ID}
155+
restart: unless-stopped
156+
chaqna-gaudi-backend-server:
157+
image: opea/chatqna:latest
158+
container_name: chatqna-gaudi-backend-server
159+
depends_on:
160+
- redis-vector-db
161+
- tei-embedding-service
162+
- embedding
163+
- retriever
164+
- tei-reranking-service
165+
- reranking
166+
- ray-service
167+
- llm
168+
ports:
169+
- "8888:8888"
170+
environment:
171+
- no_proxy=${no_proxy}
172+
- https_proxy=${https_proxy}
173+
- http_proxy=${http_proxy}
174+
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP}
175+
- EMBEDDING_SERVICE_HOST_IP=${EMBEDDING_SERVICE_HOST_IP}
176+
- RETRIEVER_SERVICE_HOST_IP=${RETRIEVER_SERVICE_HOST_IP}
177+
- RERANK_SERVICE_HOST_IP=${RERANK_SERVICE_HOST_IP}
178+
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP}
179+
- LLM_SERVICE_PORT=${LLM_SERVICE_PORT}
180+
ipc: host
181+
restart: always
182+
chaqna-gaudi-ui-server:
183+
image: opea/chatqna-ui:latest
184+
container_name: chatqna-gaudi-ui-server
185+
depends_on:
186+
- chaqna-gaudi-backend-server
187+
ports:
188+
- "5173:5173"
189+
environment:
190+
- no_proxy=${no_proxy}
191+
- https_proxy=${https_proxy}
192+
- http_proxy=${http_proxy}
193+
- CHAT_BASE_URL=${BACKEND_SERVICE_ENDPOINT}
194+
- UPLOAD_FILE_BASE_URL=${DATAPREP_SERVICE_ENDPOINT}
195+
- GET_FILE=${DATAPREP_GET_FILE_ENDPOINT}
196+
- DELETE_FILE=${DATAPREP_DELETE_FILE_ENDPOINT}
197+
ipc: host
198+
restart: always
199+
200+
networks:
201+
default:
202+
driver: bridge

0 commit comments

Comments
 (0)