Skip to content

Commit b6ff201

Browse files
jinjunzhchensuyuepre-commit-ci[bot]
authored andcommitted
add milvus microservice (opea-project#158)
* Use common security content for OPEA projects (opea-project#151) * add python coverage Signed-off-by: chensuyue <[email protected]> * docs update Signed-off-by: chensuyue <[email protected]> * Revert "add python coverage" This reverts commit 69615b1. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: chensuyue <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jinjunzh <[email protected]> * add milvus microservice Signed-off-by: jinjunzh <[email protected]> * fix the typo Signed-off-by: jinjunzh <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: jinjunzh <[email protected]> --------- Signed-off-by: chensuyue <[email protected]> Signed-off-by: jinjunzh <[email protected]> Co-authored-by: chen, suyue <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 2cd5eab commit b6ff201

File tree

17 files changed

+1391
-0
lines changed

17 files changed

+1391
-0
lines changed

comps/dataprep/milvus/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Dataprep Microservice with Milvus
2+
3+
# 🚀Start Microservice with Python
4+
5+
## Install Requirements
6+
7+
```bash
8+
pip install -r requirements.txt
9+
```
10+
11+
## Start Milvus Server
12+
13+
Please refer to this [readme](../../../vectorstores/langchain/milvus/README.md).
14+
15+
## Setup Environment Variables
16+
17+
```bash
18+
export http_proxy=${your_http_proxy}
19+
export https_proxy=${your_http_proxy}
20+
export MILVUS=${your_milvus_host_ip}
21+
export MILVUS_PORT=19530
22+
export COLLECTION_NAME=${your_collection_name}
23+
export TEI_EMBEDDING_ENDPOINT=${your_tei_endpoint}
24+
```
25+
26+
## Start Document Preparation Microservice for Milvus with Python Script
27+
28+
Start document preparation microservice for Milvus with below command.
29+
30+
```bash
31+
python prepare_doc_milvus.py
32+
```
33+
34+
# 🚀Start Microservice with Docker
35+
36+
## Build Docker Image
37+
38+
```bash
39+
cd ../../../../
40+
docker build -t opea/dataprep-milvus:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/dataprep/milvus/docker/Dockerfile .
41+
```
42+
43+
## Run Docker with CLI
44+
45+
```bash
46+
docker run -d --name="dataprep-milvus-server" -p 6010:6010 --ipc=host -v /your_document_path/:/home/user/doc -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TEI_EMBEDDING_ENDPOINT=${your_tei_endpoint} -e MILVUS=${your_milvus_host_ip} opea/dataprep-milvus:latest
47+
```
48+
49+
# Invoke Microservice
50+
51+
Once document preparation microservice for Qdrant is started, user can use below command to invoke the microservice to convert the document to embedding and save to the database.
52+
53+
```bash
54+
curl -X POST -H "Content-Type: application/json" -d '{"path":"/home/user/doc/your_document_name"}' http://localhost:6010/v1/dataprep
55+
```

comps/dataprep/milvus/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0

comps/dataprep/milvus/config.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
import os
5+
6+
# Embedding model
7+
EMBED_MODEL = os.getenv("EMBED_MODEL", "maidalun1020/bce-embedding-base_v1")
8+
# Embedding endpoints
9+
EMBEDDING_ENDPOINT = os.getenv("TEI_EMBEDDING_ENDPOINT", "")
10+
# MILVUS configuration
11+
MILVUS_HOST = os.getenv("MILVUS", "localhost")
12+
MILVUS_PORT = int(os.getenv("MILVUS_PORT", 19530))
13+
COLLECTION_NAME = os.getenv("COLLECTION_NAME", "rag_milvus")
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
2+
# Copyright (C) 2024 Intel Corporation
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
FROM python:3.11-slim
6+
7+
ENV LANG C.UTF-8
8+
9+
RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
10+
build-essential \
11+
libgl1-mesa-glx \
12+
libjemalloc-dev \
13+
vim
14+
15+
RUN useradd -m -s /bin/bash user && \
16+
mkdir -p /home/user && \
17+
chown -R user /home/user/
18+
19+
USER user
20+
21+
COPY comps /home/user/comps
22+
23+
RUN pip install --no-cache-dir --upgrade pip && \
24+
pip install --no-cache-dir -r /home/user/comps/dataprep/milvus/requirements.txt
25+
26+
ENV PYTHONPATH=$PYTHONPATH:/home/user
27+
28+
WORKDIR /home/user/comps/dataprep/milvus
29+
30+
ENTRYPOINT ["python", "prepare_doc_milvus.py"]
31+
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
import os
5+
import sys
6+
7+
from config import COLLECTION_NAME, EMBED_MODEL, EMBEDDING_ENDPOINT, MILVUS_HOST, MILVUS_PORT
8+
from langchain.text_splitter import RecursiveCharacterTextSplitter
9+
from langchain_community.embeddings import HuggingFaceBgeEmbeddings, HuggingFaceEmbeddings, HuggingFaceHubEmbeddings
10+
from langchain_milvus.vectorstores import Milvus
11+
12+
from comps.cores.mega.micro_service import opea_microservices, register_microservice
13+
from comps.cores.proto.docarray import DocPath
14+
from comps.cores.telemetry.opea_telemetry import opea_telemetry
15+
16+
current_script_path = os.path.dirname(os.path.abspath(__file__))
17+
parent_dir = os.path.dirname(current_script_path)
18+
sys.path.append(parent_dir)
19+
from utils import document_loader
20+
21+
22+
@register_microservice(
23+
name="opea_service@prepare_doc_milvus",
24+
endpoint="/v1/dataprep",
25+
host="0.0.0.0",
26+
port=6010,
27+
input_datatype=DocPath,
28+
output_datatype=None,
29+
)
30+
# @opea_telemetry
31+
def ingest_documents(doc_path: DocPath):
32+
"""Ingest document to Milvus."""
33+
doc_path = doc_path.path
34+
print(f"Parsing document {doc_path}.")
35+
36+
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=100, add_start_index=True)
37+
content = document_loader(doc_path)
38+
chunks = text_splitter.split_text(content)
39+
40+
print("Done preprocessing. Created ", len(chunks), " chunks of the original pdf")
41+
# Create vectorstore
42+
if EMBEDDING_ENDPOINT:
43+
# create embeddings using TEI endpoint service
44+
embedder = HuggingFaceHubEmbeddings(model=EMBEDDING_ENDPOINT)
45+
else:
46+
# create embeddings using local embedding model
47+
embedder = HuggingFaceBgeEmbeddings(model_name=EMBED_MODEL)
48+
49+
# Batch size
50+
batch_size = 32
51+
num_chunks = len(chunks)
52+
for i in range(0, num_chunks, batch_size):
53+
batch_chunks = chunks[i : i + batch_size]
54+
batch_texts = batch_chunks
55+
56+
_ = Milvus.from_texts(
57+
texts=batch_texts,
58+
embedding=embedder,
59+
collection_name=COLLECTION_NAME,
60+
connection_args={"host": MILVUS_HOST, "port": MILVUS_PORT},
61+
)
62+
print(f"Processed batch {i//batch_size + 1}/{(num_chunks-1)//batch_size + 1}")
63+
64+
65+
if __name__ == "__main__":
66+
opea_microservices["opea_service@prepare_doc_milvus"].start()
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
beautifulsoup4
2+
docarray[full]
3+
easyocr
4+
fastapi
5+
frontend==0.0.3
6+
huggingface_hub
7+
langchain
8+
langchain-community
9+
langchain_milvus
10+
numpy
11+
opentelemetry-api
12+
opentelemetry-exporter-otlp
13+
opentelemetry-sdk
14+
pandas
15+
Pillow
16+
pydantic==2.7.3
17+
pymilvus==2.4.3
18+
pymupdf==1.24.5
19+
python-docx==0.8.11
20+
sentence_transformers
21+
shortuuid
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Retriever Microservice with Milvus
2+
3+
# 🚀Start Microservice with Python
4+
5+
## Install Requirements
6+
7+
```bash
8+
pip install -r requirements.txt
9+
```
10+
11+
## Start Milvus Server
12+
13+
Please refer to this [readme](../../../vectorstores/langchain/milvus/README.md).
14+
15+
## Setup Environment Variables
16+
17+
```bash
18+
export http_proxy=${your_http_proxy}
19+
export https_proxy=${your_http_proxy}
20+
export MILVUS=${your_milvus_host_ip}
21+
export MILVUS_PORT=19530
22+
export COLLECTION_NAME=${your_collection_name}
23+
export TEI_EMBEDDING_ENDPOINT=${your_tei_endpoint}
24+
```
25+
26+
## Start Retriever Service
27+
28+
```bash
29+
export TEI_EMBEDDING_ENDPOINT="http://${your_ip}:6060"
30+
python langchain/retriever_redis.py
31+
```
32+
33+
# 🚀Start Microservice with Docker
34+
35+
## Build Docker Image
36+
37+
```bash
38+
cd ../../
39+
docker build -t opea/retriever-milvus:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/retrievers/langchain/milvus/docker/Dockerfile .
40+
```
41+
42+
## Run Docker with CLI
43+
44+
```bash
45+
docker run -d --name="retriever-milvus-server" -p 7000:7000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e TEI_EMBEDDING_ENDPOINT=${your_tei_endpoint} -e MILVUS=${your_milvus_host_ip} opea/retriever-milvus:latest
46+
```
47+
48+
# 🚀3. Consume Retriever Service
49+
50+
## 3.1 Check Service Status
51+
52+
```bash
53+
curl http://${your_ip}:7000/v1/health_check \
54+
-X GET \
55+
-H 'Content-Type: application/json'
56+
```
57+
58+
## 3.2 Consume Embedding Service
59+
60+
To consume the Retriever Microservice, you can generate a mock embedding vector of length 768 with Python.
61+
62+
```bash
63+
your_embedding=$(python -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)")
64+
curl http://${your_ip}:7000/v1/retrieval \
65+
-X POST \
66+
-d "{\"text\":\"What is the revenue of Nike in 2023?\",\"embedding\":${your_embedding}}" \
67+
-H 'Content-Type: application/json'
68+
```
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
import os
5+
6+
# Embedding model
7+
EMBED_MODEL = os.getenv("EMBED_MODEL", "maidalun1020/bce-embedding-base_v1")
8+
# Embedding endpoints
9+
EMBED_ENDPOINT = os.getenv("TEI_EMBEDDING_ENDPOINT", "")
10+
# MILVUS configuration
11+
MILVUS_HOST = os.getenv("MILVUS", "localhost")
12+
MILVUS_PORT = int(os.getenv("MILVUS_PORT", 19530))
13+
COLLECTION_NAME = os.getenv("COLLECTION_NAME", "rag_milvus")
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
2+
# Copyright (C) 2024 Intel Corporation
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
FROM python:3.11-slim
6+
7+
ENV LANG C.UTF-8
8+
9+
RUN apt-get update -y && apt-get install -y --no-install-recommends --fix-missing \
10+
build-essential \
11+
libgl1-mesa-glx \
12+
libjemalloc-dev \
13+
vim
14+
15+
RUN useradd -m -s /bin/bash user && \
16+
mkdir -p /home/user && \
17+
chown -R user /home/user/
18+
19+
USER user
20+
21+
COPY comps /home/user/comps
22+
23+
RUN pip install --no-cache-dir --upgrade pip && \
24+
pip install --no-cache-dir -r /home/user/comps/retrievers/langchain/milvus/requirements.txt
25+
26+
ENV PYTHONPATH=$PYTHONPATH:/home/user
27+
28+
WORKDIR /home/user/comps/retrievers/langchain/milvus
29+
30+
ENTRYPOINT ["python", "retriever_milvus.py"]
31+

0 commit comments

Comments
 (0)