Skip to content

Commit c37d9c8

Browse files
Updated READMEs for kubernetes example pipelines (#353)
* Updated READMEs for kubernetes. Signed-off-by: mkbhanda <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Kubernetes related Readme. Signed-off-by: mkbhanda <[email protected]> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 89ddec9 commit c37d9c8

File tree

7 files changed

+132
-22
lines changed

7 files changed

+132
-22
lines changed

ChatQnA/kubernetes/manifests/README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,40 +35,41 @@ For Gaudi:
3535
## Deploy ChatQnA pipeline
3636
This involves deploying the ChatQnA custom resource. You can use chatQnA_xeon.yaml or if you have a Gaudi cluster, you could use chatQnA_gaudi.yaml.
3737

38+
1. Create namespace and deploy application
3839
```sh
3940
kubectl create ns chatqa
4041
kubectl apply -f $(pwd)/chatQnA_xeon.yaml
4142
```
4243

43-
**GMC will reconcile the ChatQnA custom resource and get all related components/services ready**
44+
2. GMC will reconcile the ChatQnA custom resource and get all related components/services ready. Check if the service up.
4445

4546
```sh
4647
kubectl get service -n chatqa
4748
```
4849

49-
**Obtain the ChatQnA custom resource/pipeline access URL**
50+
3. Retrieve the application access URL
5051

5152
```sh
5253
kubectl get gmconnectors.gmc.opea.io -n chatqa
5354
NAME URL READY AGE
5455
chatqa http://router-service.chatqa.svc.cluster.local:8080 8/0/8 3m
5556
```
5657

57-
**Deploy a client pod to test the ChatQnA application**
58+
4. Deploy a client pod to test the application
5859

5960
```sh
6061
kubectl create deployment client-test -n chatqa --image=python:3.8.13 -- sleep infinity
6162
```
6263

63-
**Access the pipeline using the above URL from the client pod**
64+
5. Access the application using the above URL from the client pod
6465

6566
```sh
6667
export CLIENT_POD=$(kubectl get pod -l app=client-test -o jsonpath={.items..metadata.name})
6768
export accessUrl=$(kubectl get gmc -n chatqa -o jsonpath="{.items[?(@.metadata.name=='chatqa')].status.accessUrl}")
6869
kubectl exec "$CLIENT_POD" -n chatqa -- curl $accessUrl -X POST -d '{"text":"What is the revenue of Nike in 2023?","parameters":{"max_new_tokens":17, "do_sample": true}}' -H 'Content-Type: application/json'
6970
```
7071

71-
**Modify ChatQnA custom resource to use another LLM model**
72+
6. Perhaps you want to try another LLM model? Just modify the application custom resource to use another LLM model
7273

7374
Should you, for instance, want to change the LLM model you are using in the ChatQnA pipeline, just edit the custom resource file.
7475
For example, to use Llama-2-7b-chat-hf make the following edit:
@@ -83,18 +84,18 @@ For example, to use Llama-2-7b-chat-hf make the following edit:
8384
LLM_MODEL_ID: Llama-2-7b-chat-hf
8485
```
8586
86-
Apply the change using
87+
7. Apply the change
8788
```
8889
kubectl apply -f $(pwd)/chatQnA_xeon.yaml
8990
```
9091

91-
**Check that the tgi-svc-deployment has been changed to use the new LLM Model**
92+
8. Check that the tgi-svc-deployment has been changed to use the new LLM Model
9293

9394
```sh
9495
kubectl get deployment tgi-svc-deployment -n chatqa -o jsonpath="{.spec.template.spec.containers[*].env[?(@.name=='LLM_MODEL_ID')].value}"
9596
```
9697

97-
**Access the updated pipeline using the same URL frm above from within the client pod**
98+
9. Access the updated pipeline using the same URL from above using the client pod
9899

99100
```sh
100101
kubectl exec "$CLIENT_POD" -n chatqa -- curl $accessUrl -X POST -d '{"text":"What is the revenue of Nike in 2023?","parameters":{"max_new_tokens":17, "do_sample": true}}' -H 'Content-Type: application/json'

CodeGen/README.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,14 +22,12 @@ The workflow falls into the following architecture:
2222

2323
The CodeGen service can be effortlessly deployed on either Intel Gaudi2 or Intel Xeon Scalable Processor.
2424

25-
## Deploy CodeGen on Gaudi
25+
## Deploy CodeGen using Docker
2626

27-
Refer to the [Gaudi Guide](./docker/gaudi/README.md) for instructions on deploying CodeGen on Gaudi.
27+
- Refer to the [Gaudi Guide](./docker/gaudi/README.md) for instructions on deploying CodeGen on Gaudi.
2828

29-
## Deploy CodeGen on Xeon
29+
- Refer to the [Xeon Guide](./docker/xeon/README.md) for instructions on deploying CodeGen on Xeon.
3030

31-
Refer to the [Xeon Guide](./docker/xeon/README.md) for instructions on deploying CodeGen on Xeon.
32-
33-
## Deploy CodeGen into Kubernetes on Xeon & Gaudi
31+
## Deploy CodeGen using Kubernetes
3432

3533
Refer to the [Kubernetes Guide](./kubernetes/manifests/README.md) for instructions on deploying CodeGen into Kubernetes on Xeon & Gaudi.

CodeGen/kubernetes/README.md

Whitespace-only changes.

CodeTrans/README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,12 @@ This Code Translation use case uses Text Generation Inference on Intel Gaudi2 or
1212

1313
The Code Translation service can be effortlessly deployed on either Intel Gaudi2 or Intel Xeon Scalable Processor.
1414

15-
## Deploy Code Translation on Gaudi
15+
## Deploy with Docker
1616

17-
Refer to the [Gaudi Guide](./docker/gaudi/README.md) for instructions on deploying Code Translation on Gaudi.
17+
- To deploy Code Translation on Gaudi please refer to the [Gaudi Guide](./docker/gaudi/README.md)
1818

19-
## Deploy Code Translation on Xeon
19+
- To deploy Code Translation on Xeon please refer to the [Xeon Guide](./docker/xeon/README.md).
2020

21-
Refer to the [Xeon Guide](./docker/xeon/README.md) for instructions on deploying Code Translation on Xeon.
21+
## Deploy with Kubernetes
22+
23+
Please refer to the [Code Translation Kubernetes Guide](./kubernetes/README.md)

CodeTrans/kubernetes/README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
<h1 align="center" id="title">Deploy CodeTrans in a Kubernetes Cluster</h1>
2+
3+
This document outlines the deployment process for a Code Translation (CodeTran) application that utilizes the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice components on Intel Xeon servers and Gaudi machines.
4+
5+
Please install GMC in your Kubernetes cluster, if you have not already done so, by following the steps in Section "Getting Started" at [GMC Install](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector#readme). We will soon publish images to Docker Hub, at which point no builds will be required, further simplifying install.
6+
7+
If you have only Intel Xeon machines you could use the codetrans_xeon.yaml file or if you have a Gaudi cluster you could use codetrans_gaudi.yaml
8+
In the below example we illustrate on Xeon.
9+
10+
## Deploy the RAG application
11+
12+
1. Create the desired namespace if it does not already exist and deploy the application
13+
```bash
14+
export APP_NAMESPACE=CT
15+
kubectl create ns $APP_NAMESPACE
16+
sed -i "s|namespace: codetrans|namespace: $APP_NAMESPACE|g" ./codetrans_xeon.yaml
17+
kubectl apply -f ./codetrans_xeon.yaml
18+
```
19+
20+
2. Check if the application is up and ready
21+
```bash
22+
kubectl get pods -n $APP_NAMESPACE
23+
```
24+
25+
3. Deploy a client pod for testing
26+
```bash
27+
kubectl create deployment client-test -n $APP_NAMESPACE --image=python:3.8.13 -- sleep infinity
28+
```
29+
30+
4. Check that client pod is ready
31+
```bash
32+
kubectl get pods -n $APP_NAMESPACE
33+
```
34+
35+
5. Send request to application
36+
```bash
37+
export CLIENT_POD=$(kubectl get pod -n $APP_NAMESPACE -l app=client-test -o jsonpath={.items..metadata.name})
38+
export accessUrl=$(kubectl get gmc -n $APP_NAMESPACE -o jsonpath="{.items[?(@.metadata.name=='codetrans')].status.accessUrl}")
39+
kubectl exec "$CLIENT_POD" -n $APP_NAMESPACE -- curl $accessUrl -X POST -d '{"language_from": "Golang","language_to": "Python","source_code": "package main\n\nimport \"fmt\"\nfunc main() {\n fmt.Println(\"Hello, World!\");\n}"}' -H 'Content-Type: application/json' > $LOG_PATH/gmc_codetrans.log
40+
```

DocSum/README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,14 @@ The architecture for document summarization will be illustrated/described below:
1313
# Deploy Document Summarization Service
1414

1515
The Document Summarization service can be effortlessly deployed on either Intel Gaudi2 or Intel XEON Scalable Processors.
16+
Based on whether you want to use Docker or Kubernetes, please follow the instructions below.
1617

17-
## Deploy Document Summarization on Gaudi
18+
## Deploy using Docker
1819

19-
Refer to the [Gaudi Guide](./docker/gaudi/README.md) for instructions on deploying Document Summarization on Gaudi.
20+
- Refer to the [Gaudi Guide](./docker/gaudi/README.md) for instructions on deploying Document Summarization on Gaudi.
2021

21-
## Deploy Document Summarization on Xeon
22+
- Refer to the [Xeon Guide](./docker/xeon/README.md) for instructions on deploying Document Summarization on Xeon.
2223

23-
Refer to the [Xeon Guide](./docker/xeon/README.md) for instructions on deploying Document Summarization on Xeon.
24+
## Deploy using Kubernetes
25+
26+
Please refer to [Kubernetes deployment](./kubernetes/README.md)

DocSum/kubernetes/README.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
<h1 align="center" id="title">Deploy DocSum in Kubernetes Cluster</h1>
2+
3+
This document outlines the deployment process for a Document Summary (DocSum) application that utilizes the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice components on Intel Xeon servers and Gaudi machines.
4+
The DocSum Service leverages a Kubernetes operator called genai-microservices-connector(GMC). GMC supports connecting microservices to create pipelines based on the specification in the pipeline yaml file, in addition it allows the user to dynamically control which model is used in a service such as an LLM or embedder. The underlying pipeline language also supports using external services that may be running in public or private clouds elsewhere.
5+
6+
Please install GMC in your Kubernetes cluster, if you have not already done so, by following the steps in Section "Getting Started" at [GMC Install](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector#readme). We will soon publish images to Docker Hub, at which point no builds will be required, further simplifying install.
7+
8+
The DocSum application is defined as a Custom Resource (CR) file that the above GMC operator acts upon. It first checks if the microservices listed in the CR yaml file are running, if not it starts them and then proceeds to connect them. When the DocSum RAG pipeline is ready, the service endpoint details are returned, letting you use the application. Should you use "kubectl get pods" commands you will see all the component microservices, in particular embedding, retriever, rerank, and llm.
9+
10+
The DocSum pipeline uses prebuilt images. The Xeon version uses the prebuilt image llm-docsum-tgi:latest which internally leverages the
11+
the image ghcr.io/huggingface/text-generation-inference:1.4. The service is called tgi-svc. Meanwhile, the Gaudi version launches the
12+
service tgi-gaudi-svc, which uses the image ghcr.io/huggingface/tgi-gaudi:1.2.1. Both TGI model services serve the model specified in the LLM_MODEL_ID variable that is exported by you. In the below example we use Intel/neural-chat-7b-v3-3.
13+
14+
[NOTE]
15+
Please refer to [Docker Xeon README](https://github.com/opea-project/GenAIExamples/blob/main/DocSum/docker/xeon/README.md) or
16+
[Docker Gaudi README](https://github.com/opea-project/GenAIExamples/blob/main/DocSum/docker/gaudi/README.md) to build the OPEA images.
17+
These will be available on Docker Hub soon, simplifying installation.
18+
19+
## Deploy the RAG pipeline
20+
This involves deploying the application pipeline custom resource. You can use docsum_xeon.yaml if you have just a Xeon cluster or docsum_gaudi.yaml if you have a Gaudi cluster.
21+
22+
1. Setup Environment variables. These are specific to the user. Skip the proxy settings if you are not operating behind one.
23+
24+
```bash
25+
export no_proxy=${your_no_proxy}
26+
export http_proxy=${your_http_proxy}
27+
export https_proxy=${your_http_proxy}
28+
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
29+
export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
30+
export ns=${docsum}
31+
```
32+
33+
2. Create namespace for the application and deploy it
34+
```bash
35+
kubectl create ns ${ns}
36+
kubectl apply -f $(pwd)/docsum_xeon.yaml
37+
```
38+
39+
3. GMC will reconcile the custom resource and get all related components/services ready. Confirm the service status using below command
40+
```bash
41+
kubectl get service -n ${ns}
42+
```
43+
44+
4. Obtain the custom resource/pipeline access URL
45+
46+
```bash
47+
kubectl get gmconnectors.gmc.opea.io -n ${ns}
48+
NAME URL READY AGE
49+
docsum http://router-service.docsum.svc.cluster.local:8080 8/0/8 3m
50+
```
51+
52+
5. Deploy a client pod to test the application
53+
54+
```bash
55+
kubectl create deployment client-test -n ${ns} --image=python:3.8.13 -- sleep infinity
56+
```
57+
58+
6. Access the pipeline using the above URL from the client pod and execute a request
59+
60+
```bash
61+
export CLIENT_POD=$(kubectl get pod -l app=client-test -o jsonpath={.items..metadata.name})
62+
export accessUrl=$(kubectl get gmc -n $ns -o jsonpath="{.items[?(@.metadata.name=='docsum')].status.accessUrl}")
63+
kubectl exec "$CLIENT_POD" -n $ns -- curl $accessUrl -X POST -d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' -H 'Content-Type: application/json'
64+
```
65+
66+
7. Clean up. Use standard Kubernetes custom resource remove commands. Confirm cleaned by retrieving pods in application namespace.

0 commit comments

Comments
 (0)