Skip to content

Commit 2e41dcf

Browse files
Support Llama index for vLLM native (#692)
Signed-off-by: zhenwei-intel <[email protected]>
1 parent 391c4a5 commit 2e41dcf

17 files changed

+1032
-6
lines changed

.github/workflows/docker/compose/llms-compose-cd.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,12 @@
44
services:
55
llm-native:
66
build:
7-
dockerfile: comps/llms/text-generation/native/Dockerfile
7+
dockerfile: comps/llms/text-generation/native/langchain/Dockerfile
88
image: ${REGISTRY:-opea}/llm-native:${TAG:-latest}
9+
llm-native-llamaindex:
10+
build:
11+
dockerfile: comps/llms/text-generation/native/llama_index/Dockerfile
12+
image: ${REGISTRY:-opea}/llm-native-llamaindex:${TAG:-latest}
913
vllm-openvino:
1014
build:
1115
context: vllm-openvino
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# HABANA environment
5+
# FROM vault.habana.ai/gaudi-docker/1.16.1/ubuntu22.04/habanalabs/pytorch-installer-2.2.2:latest as hpu
6+
FROM opea/habanalabs:1.16.1-pytorch-installer-2.2.2 as hpu
7+
8+
ENV LANG=en_US.UTF-8
9+
ARG REPO=https://github.com/huggingface/optimum-habana.git
10+
ARG REPO_VER=v1.12.1
11+
12+
RUN apt-get update && apt-get install -y --no-install-recommends --fix-missing \
13+
git-lfs \
14+
libgl1-mesa-glx \
15+
libjemalloc-dev
16+
17+
RUN useradd -m -s /bin/bash user && \
18+
mkdir -p /home/user && \
19+
chown -R user /home/user/
20+
21+
USER user
22+
23+
RUN git lfs install
24+
25+
COPY comps /home/user/comps
26+
27+
RUN pip install --no-cache-dir --upgrade-strategy eager optimum[habana] && \
28+
pip install --no-cache-dir git+https://github.com/HabanaAI/[email protected]
29+
30+
RUN git clone ${REPO} /home/user/optimum-habana && \
31+
cd /home/user/optimum-habana && git checkout ${REPO_VER} && \
32+
cd examples/text-generation && pip install --no-cache-dir -r requirements.txt && \
33+
cd /home/user/comps/llms/text-generation/native/langchain && \
34+
pip install --no-cache-dir -r requirements.txt && \
35+
pip install --no-cache-dir --upgrade --force-reinstall pydantic
36+
37+
ENV PYTHONPATH=/root:/home/user
38+
39+
WORKDIR /home/user/comps/llms/text-generation/native/langchain
40+
41+
ENTRYPOINT ["python", "llm.py"]

comps/llms/text-generation/native/README.md renamed to comps/llms/text-generation/native/langchain/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,9 @@ export LLM_NATIVE_MODEL="Qwen/Qwen2-7B-Instruct"
1717
### 1.2 Build Docker Image
1818

1919
```bash
20-
cd ../../../../
21-
docker build -t opea/llm-native:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/native/Dockerfile .
20+
cd ../../../../../
21+
docker build -t opea/llm-native:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/native/langchain
22+
Dockerfile .
2223
```
2324

2425
To start a docker container, you have two options:

comps/llms/text-generation/native/Dockerfile renamed to comps/llms/text-generation/native/llama_index/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,11 @@ RUN pip install --no-cache-dir --upgrade-strategy eager optimum[habana] && \
3030
RUN git clone ${REPO} /home/user/optimum-habana && \
3131
cd /home/user/optimum-habana && git checkout ${REPO_VER} && \
3232
cd examples/text-generation && pip install --no-cache-dir -r requirements.txt && \
33-
cd /home/user/comps/llms/text-generation/native && pip install --no-cache-dir -r requirements.txt && \
33+
cd /home/user/comps/llms/text-generation/native/llama_index && pip install --no-cache-dir -r requirements.txt && \
3434
pip install --no-cache-dir --upgrade --force-reinstall pydantic
3535

3636
ENV PYTHONPATH=/root:/home/user
3737

38-
WORKDIR /home/user/comps/llms/text-generation/native
38+
WORKDIR /home/user/comps/llms/text-generation/native/llama_index
3939

4040
ENTRYPOINT ["python", "llm.py"]
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# LLM Native Microservice
2+
3+
LLM Native microservice uses [optimum-habana](https://github.com/huggingface/optimum-habana) for model initialization and warm-up, focusing solely on large language models (LLMs). It operates without frameworks like TGI/VLLM, using PyTorch directly for inference, and supports only non-streaming formats. This streamlined approach optimizes performance on Habana hardware.
4+
5+
## 🚀1. Start Microservice
6+
7+
If you start an LLM microservice with docker, the `docker_compose_llm.yaml` file will automatically start a Native LLM service with docker.
8+
9+
### 1.1 Setup Environment Variables
10+
11+
In order to start Native LLM service, you need to setup the following environment variables first.
12+
13+
```bash
14+
export LLM_NATIVE_MODEL="Qwen/Qwen2-7B-Instruct"
15+
```
16+
17+
### 1.2 Build Docker Image
18+
19+
```bash
20+
cd ../../../../../
21+
docker build -t opea/llm-native:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/native/llama_index/Dockerfile .
22+
```
23+
24+
To start a docker container, you have two options:
25+
26+
- A. Run Docker with CLI
27+
- B. Run Docker with Docker Compose
28+
29+
You can choose one as needed.
30+
31+
### 1.3 Run Docker with CLI (Option A)
32+
33+
```bash
34+
docker run -d --runtime=habana --name="llm-native-server" -p 9000:9000 -e https_proxy=$https_proxy -e http_proxy=$http_proxy -e TOKENIZERS_PARALLELISM=false -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host -e LLM_NATIVE_MODEL=${LLM_NATIVE_MODEL} opea/llm-native:latest
35+
```
36+
37+
### 1.4 Run Docker with Docker Compose (Option B)
38+
39+
```bash
40+
docker compose -f docker_compose_llm.yaml up -d
41+
```
42+
43+
## 🚀2. Consume LLM Service
44+
45+
### 2.1 Check Service Status
46+
47+
```bash
48+
curl http://${your_ip}:9000/v1/health_check\
49+
-X GET \
50+
-H 'Content-Type: application/json'
51+
```
52+
53+
### 2.2 Consume LLM Service
54+
55+
```bash
56+
curl http://${your_ip}:9000/v1/chat/completions\
57+
-X POST \
58+
-d '{"query":"What is Deep Learning?"}' \
59+
-H 'Content-Type: application/json'
60+
```

0 commit comments

Comments
 (0)