Skip to content

Commit 987870f

Browse files
Add docs for all 3 use cases of ChatQnA examples and change models for switch case (#360)
* add all 3 usecases of ChatQnA examples and change models for switch case. Signed-off-by: zhlsunshine <[email protected]> * change the model to openlm-research/open_llama_3b. Signed-off-by: zhlsunshine <[email protected]> * change use cases readme. Signed-off-by: zhlsunshine <[email protected]> * fix doc error based on comments. Signed-off-by: zhlsunshine <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 39fb55e commit 987870f

File tree

3 files changed

+225
-2
lines changed

3 files changed

+225
-2
lines changed

microservices-connector/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
This repo defines the GenAI Microservice Connector(GMC) for OPEA projects. GMC can be used to compose and adjust GenAI pipelines dynamically
44
on kubernetes. It can leverage the microservices provided by [GenAIComps](https://github.com/opea-project/GenAIComps) and external services to compose GenAI pipelines. External services might be running in a public cloud or on-prem by providing an URL and access details such as an API key and ensuring there is network connectivity. It also allows users to adjust the pipeline on the fly like switching to a different Large language Model(LLM), adding new functions into the chain(like adding guardrails),etc. GMC supports different types of steps in the pipeline, like sequential, parallel and conditional.
55

6-
Please refer this [usage_guide](./usage_guide.md) for sample use cases.
6+
Please refer to [usage_guide](./usage_guide.md) for sample use cases.
7+
Please refer to [chatqna_use_cases](./config/samples/ChatQnA/use_cases.md) for more ChatQnA use cases.
78

89
## Description
910

microservices-connector/config/samples/ChatQnA/chatQnA_switch_xeon.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,5 +120,5 @@ spec:
120120
serviceName: tgi-service-llama
121121
config:
122122
endpoint: /generate
123-
MODEL_ID: HuggingFaceH4/mistral-7b-grok
123+
MODEL_ID: openlm-research/open_llama_3b
124124
isDownstreamService: true
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
# ChatQnA Use Cases in Kubernetes Cluster via GMC
2+
3+
This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline components on Intel Xeon server and Gaudi machines.
4+
5+
The ChatQnA Service leverages a Kubernetes operator called genai-microservices-connector(GMC). GMC supports connecting microservices to create pipelines based on the specification in the pipeline yaml file in addition to allowing the user to dynamically control which model is used in a service such as an LLM or embedder. The underlying pipeline language also supports using external services that may be running in public or private cloud elsewhere.
6+
7+
Install GMC in your Kubernetes cluster, if you have not already done so, by following the steps in Section "Getting Started" at [GMC Install](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector). Soon as we publish images to Docker Hub, at which point no builds will be required, simplifying install.
8+
9+
The ChatQnA application is defined as a Custom Resource (CR) file that the above GMC operator acts upon. It first checks if the microservices listed in the CR yaml file are running, if not starts them and then proceeds to connect them. When the ChatQnA RAG pipeline is ready, the service endpoint details are returned, letting you use the application. Should you use "kubectl get pods" commands you will see all the component microservices, in particular `embedding`, `retriever`, `rerank`, and `llm`.
10+
11+
## Using prebuilt images
12+
13+
The ChatQnA uses the below prebuilt images if you choose a Xeon deployment
14+
15+
- embedding: opea/embedding-tei:latest
16+
- retriever: opea/retriever-redis:latest
17+
- reranking: opea/reranking-tei:latest
18+
- llm: opea/llm-tgi:latest
19+
- dataprep-redis: opea/dataprep-redis:latest
20+
- tei_xeon_service: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
21+
- tei_embedding_service: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
22+
- tgi-service: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
23+
- redis-vector-db: redis/redis-stack:7.2.0-v9
24+
25+
Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
26+
For Gaudi:
27+
28+
- tei-embedding-service: opea/tei-gaudi:latest
29+
- tgi-service: ghcr.io/huggingface/tgi-gaudi:1.2.1
30+
31+
> [NOTE]
32+
> Please refer to [Xeon README](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/xeon/README.md) or [Gaudi README](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/README.md) to build the OPEA images. These too will be available on Docker Hub soon to simplify use.
33+
34+
## Deploy ChatQnA pipeline
35+
36+
There are 3 use cases for ChatQnA example:
37+
38+
- General ChatQnA with preset RAG data
39+
- ChatQnA with data preparation which supports that the user can upload RAG data online via dataprep microservice
40+
- ChatQnA supports multiple LLM models which can be switched in runtime
41+
42+
### General ChatQnA with preset RAG data
43+
44+
This involves deploying the ChatQnA custom resource. You can use `chatQnA_xeon.yaml` or if you have a Gaudi cluster, you could use `chatQnA_gaudi.yaml`.
45+
46+
1. Create namespace and deploy application
47+
48+
```sh
49+
kubectl create ns chatqa
50+
kubectl apply -f $(pwd)/chatQnA_xeon.yaml
51+
```
52+
53+
2. GMC will reconcile the ChatQnA custom resource and get all related components/services ready. Check if the service up.
54+
55+
```sh
56+
kubectl get service -n chatqa
57+
```
58+
59+
3. Retrieve the application access URL
60+
61+
```sh
62+
kubectl get gmconnectors.gmc.opea.io -n chatqa
63+
NAME URL READY AGE
64+
chatqa http://router-service.chatqa.svc.cluster.local:8080 9/0/9 3m
65+
```
66+
67+
4. Deploy a client pod to test the application
68+
69+
```sh
70+
kubectl create deployment client-test -n chatqa --image=python:3.8.13 -- sleep infinity
71+
```
72+
73+
5. Access the application using the above URL from the client pod
74+
75+
```sh
76+
export CLIENT_POD=$(kubectl get pod -n chatqa -l app=client-test -o jsonpath={.items..metadata.name})
77+
export accessUrl=$(kubectl get gmc -n chatqa -o jsonpath="{.items[?(@.metadata.name=='chatqa')].status.accessUrl}")
78+
kubectl exec "$CLIENT_POD" -n chatqa -- curl -s --no-buffer $accessUrl -X POST -d '{"text":"What is the revenue of Nike in 2023?","parameters":{"max_new_tokens":17, "do_sample": true}}' -H 'Content-Type: application/json'
79+
```
80+
81+
6. Perhaps you want to try another LLM model? Just modify the application custom resource to use another LLM model
82+
83+
Should you, for instance, want to change the LLM model you are using in the ChatQnA pipeline, just edit the custom resource file.
84+
For example, to use Llama-2-7b-chat-hf make the following edit:
85+
86+
```yaml
87+
- name: Tgi
88+
internalService:
89+
serviceName: tgi-service-m
90+
config:
91+
LLM_MODEL_ID: Llama-2-7b-chat-hf
92+
```
93+
94+
7. Apply the change
95+
96+
```
97+
kubectl apply -f $(pwd)/chatQnA_xeon.yaml
98+
```
99+
100+
8. Check that the tgi-svc-deployment has been changed to use the new LLM Model
101+
102+
```sh
103+
kubectl get deployment tgi-service-m-deployment -n chatqa -o jsonpath="{.spec.template.spec.containers[*].env[?(@.name=='LLM_MODEL_ID')].value}"
104+
```
105+
106+
9. Access the updated pipeline using the same URL from above using the client pod
107+
108+
```sh
109+
kubectl exec "$CLIENT_POD" -n chatqa -- curl -s --no-buffer $accessUrl -X POST -d '{"text":"What are the key features of Intel Gaudi?","parameters":{"max_new_tokens":17, "do_sample": true}}' -H 'Content-Type: application/json'
110+
```
111+
112+
> [NOTE]
113+
114+
You can remove your ChatQnA pipeline by executing standard Kubernetes kubectl commands to remove a custom resource. Verify it was removed by executing kubectl get pods in the chatqa namespace.
115+
116+
### ChatQnA with data preparation
117+
118+
This involves deploying the ChatQnA custom resource. You can use `chatQnA_dataprep_xeon.yaml` or if you have a Gaudi cluster, you could use `chatQnA_dataprep_gaudi.yaml`.
119+
120+
1. Create namespace and deploy application
121+
122+
```sh
123+
kubectl create ns chatqa
124+
kubectl apply -f $(pwd)/chatQnA_dataprep_xeon.yaml
125+
```
126+
127+
2. GMC will reconcile the ChatQnA custom resource and get all related components/services ready. Check if the service up.
128+
129+
```sh
130+
kubectl get service -n chatqa
131+
```
132+
133+
3. Retrieve the application access URL
134+
135+
```sh
136+
kubectl get gmconnectors.gmc.opea.io -n chatqa
137+
NAME URL READY AGE
138+
chatqa http://router-service.chatqa.svc.cluster.local:8080 10/0/10 3m
139+
```
140+
141+
> [NOTE]
142+
143+
Comparing with `General ChatQnA with preset RAG data`, there should be `10` microservices, the extra one is the microservice of `dataprep`.
144+
145+
4. Deploy a client pod to test the application
146+
147+
```sh
148+
kubectl create deployment client-test -n chatqa --image=python:3.8.13 -- sleep infinity
149+
```
150+
151+
5. Upload the RAG data from internet via microservice `dataprep`
152+
153+
```sh
154+
export CLIENT_POD=$(kubectl get pod -n chatqa -l app=client-test -o jsonpath={.items..metadata.name})
155+
export accessUrl=$(kubectl get gmc -n chatqa -o jsonpath="{.items[?(@.metadata.name=='chatqa')].status.accessUrl}")
156+
kubectl exec "$CLIENT_POD" -n chatqa -- curl -s --no-buffer "$accessUrl/dataprep" -F 'link_list=["https://raw.githubusercontent.com/opea-project/GenAIInfra/main/microservices-connector/test/data/gaudi.txt"]' -H "Content-Type: multipart/form-data"
157+
```
158+
159+
6. Access the application using the above URL from the client pod
160+
161+
```sh
162+
kubectl exec "$CLIENT_POD" -n chatqa -- curl -s --no-buffer $accessUrl -X POST '{"text":"What are the key features of Intel Gaudi?","parameters":{"max_new_tokens":100, "do_sample": true}}' -H 'Content-Type: application/json'
163+
```
164+
165+
> [NOTE]
166+
167+
You can remove your ChatQnA pipeline by executing standard Kubernetes kubectl commands to remove a custom resource. Verify it was removed by executing kubectl get pods in the chatqa namespace.
168+
169+
### ChatQnA supports multiple LLM models
170+
171+
This involves deploying the ChatQnA custom resource. You can use `chatQnA_switch_xeon.yaml` or if you have a Gaudi cluster, you could use `chatQnA_switch_gaudi.yaml`. Moreover, this use case contains 2 LLM models: `Intel/neural-chat-7b-v3-3` and `meta-llama/CodeLlama-7b-hf`.
172+
173+
1. Create namespace and deploy application
174+
175+
```sh
176+
kubectl create ns switch
177+
kubectl apply -f $(pwd)/chatQnA_switch_xeon.yaml
178+
```
179+
180+
2. GMC will reconcile the ChatQnA custom resource and get all related components/services ready. Check if the service up.
181+
182+
```sh
183+
kubectl get service -n switch
184+
```
185+
186+
3. Retrieve the application access URL
187+
188+
```sh
189+
kubectl get gmconnectors.gmc.opea.io -n switch
190+
NAME URL READY AGE
191+
switch http://router-service.switch.svc.cluster.local:8080 15/0/15 83s
192+
```
193+
194+
> [NOTE]
195+
196+
Comparing with `General ChatQnA with preset RAG data`, there should be `15` microservices, the extra are the microservices for different embedding models and LLM models.
197+
198+
4. Deploy a client pod to test the application
199+
200+
```sh
201+
kubectl create deployment client-test -n switch --image=python:3.8.13 -- sleep infinity
202+
```
203+
204+
5. Access the application using the above URL from the client pod by using LLM model `Intel/neural-chat-7b-v3-3`
205+
206+
```sh
207+
export CLIENT_POD=$(kubectl get pod -n switch -l app=client-test -o jsonpath={.items..metadata.name})
208+
export accessUrl=$(kubectl get gmc -n switch -o jsonpath="{.items[?(@.metadata.name=='switch')].status.accessUrl}")
209+
kubectl exec "$CLIENT_POD" -n switch -- curl -s --no-buffer $accessUrl -X POST -d '{"text":"What are the key features of Intel Gaudi?", "model-id":"intel", "embedding-model-id":"small", "parameters":{"max_new_tokens":50, "do_sample": true}}' -H 'Content-Type: application/json'
210+
```
211+
212+
6. Access the application using the above URL from the client pod by using LLM model `meta-llama/CodeLlama-7b-hf`
213+
214+
```sh
215+
export CLIENT_POD=$(kubectl get pod -n switch -l app=client-test -o jsonpath={.items..metadata.name})
216+
export accessUrl=$(kubectl get gmc -n switch -o jsonpath="{.items[?(@.metadata.name=='switch')].status.accessUrl}")
217+
kubectl exec "$CLIENT_POD" -n switch -- curl -s --no-buffer $accessUrl -X POST -d '{"text":"What are the key features of Intel Gaudi?", "model-id":"llama", "embedding-model-id":"small", "parameters":{"max_new_tokens":50, "do_sample": true}}' -H 'Content-Type: application/json'
218+
```
219+
220+
> [NOTE]
221+
222+
Showing as above, user can switch the LLM models in runtime by changing the request body, such as adding `"model-id":"llama"` in request body to use the Llama model or changing it into `"model-id":"intel"` to use the Intel model.

0 commit comments

Comments
 (0)