Skip to content

Commit 5997431

Browse files
authored
Merge branch 'main' into stats
2 parents d3a1542 + c70f868 commit 5997431

File tree

13 files changed

+262
-38
lines changed

13 files changed

+262
-38
lines changed

comps/asr/src/integrations/dependency/whisper/Dockerfile.intel_hpu

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,9 @@ COPY --chown=user:user comps /home/user/comps
2222
# Install requirements and optimum habana
2323
RUN pip install --no-cache-dir --upgrade pip && \
2424
pip install --no-cache-dir -r /home/user/comps/asr/src/requirements.txt && \
25-
pip install --no-cache-dir optimum[habana]
25+
pip install --no-cache-dir optimum[habana] && \
26+
pip install git+https://github.com/huggingface/optimum-habana.git@transformers_future && \
27+
pip install --no-cache-dir --upgrade Jinja2
2628

2729
ENV PYTHONPATH=$PYTHONPATH:/home/users
2830

comps/guardrails/deployment/docker_compose/compose.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,19 @@ services:
2020
HUGGINGFACEHUB_API_TOKEN: ${HF_TOKEN}
2121
restart: unless-stopped
2222

23+
# toxicity detection service
24+
guardrails-toxicity-detection-server:
25+
image: ${REGISTRY:-opea}/guardrails-toxicity-detection:${TAG:-latest}
26+
container_name: guardrails-toxicity-detection-server
27+
ports:
28+
- "${TOXICITY_DETECTION_PORT:-9090}:9090"
29+
ipc: host
30+
environment:
31+
no_proxy: ${no_proxy}
32+
http_proxy: ${http_proxy}
33+
https_proxy: ${https_proxy}
34+
restart: unless-stopped
35+
2336
# factuality alignment service
2437
guardrails-factuality-predictionguard-server:
2538
image: ${REGISTRY:-opea}/guardrails-factuality-predictionguard:${TAG:-latest}
@@ -130,6 +143,7 @@ services:
130143
http_proxy: ${http_proxy}
131144
https_proxy: ${https_proxy}
132145
PREDICTIONGUARD_API_KEY: ${PREDICTIONGUARD_API_KEY}
146+
TOXICITY_DETECTION_COMPONENT_NAME: "PREDICTIONGUARD_TOXICITY_DETECTION"
133147
restart: unless-stopped
134148

135149
networks:

comps/guardrails/src/toxicity_detection/README.md

Lines changed: 67 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,52 @@
22

33
## Introduction
44

5-
Toxicity Detection Microservice allows AI Application developers to safeguard user input and LLM output from harmful language in a RAG environment. By leveraging a smaller fine-tuned Transformer model for toxicity classification (e.g. DistilledBERT, RoBERTa, etc.), we maintain a lightweight guardrails microservice without significantly sacrificing performance making it readily deployable on both Intel Gaudi and Xeon.
5+
Toxicity Detection Microservice allows AI Application developers to safeguard user input and LLM output from harmful language in a RAG environment. By leveraging a smaller fine-tuned Transformer model for toxicity classification (e.g. DistillBERT, RoBERTa, etc.), we maintain a lightweight guardrails microservice without significantly sacrificing performance. This [article](https://huggingface.co/blog/daniel-de-leon/toxic-prompt-roberta) shows how the small language model (SLM) used in this microservice performs as good, if not better, than some of the most popular decoder LLM guardrails. This microservice uses [`Intel/toxic-prompt-roberta`](https://huggingface.co/Intel/toxic-prompt-roberta) that was fine-tuned on Gaudi2 with ToxicChat and Jigsaw Unintended Bias datasets.
66

7-
This microservice uses [`Intel/toxic-prompt-roberta`](https://huggingface.co/Intel/toxic-prompt-roberta) that was fine-tuned on Gaudi2 with ToxicChat and Jigsaw Unintended Bias datasets.
7+
In addition to showing promising toxic detection performance, the table below compares a [locust](https://github.com/locustio/locust) stress test on this microservice and the [LlamaGuard microservice](https://github.com/opea-project/GenAIComps/blob/main/comps/guardrails/src/guardrails/README.md#LlamaGuard). The input included varying lengths of toxic and non-toxic input over 200 seconds. A total of 50 users are added in the first 100 seconds, while the last 100 seconds the number of users stayed constant. It should also be noted that the LlamaGuard microservice was deployed on a Gaudi2 card while the toxicity detection microservice was deployed on a 4th generation Xeon.
88

9-
Toxicity is defined as rude, disrespectful, or unreasonable language likely to make someone leave a conversation. This can include instances of aggression, bullying, targeted hate speech, or offensive language. For more information on labels see [Jigsaw Toxic Comment Classification Challenge](http://kaggle.com/c/jigsaw-toxic-comment-classification-challenge).
9+
| Microservice | Request Count | Median Response Time (ms) | Average Response Time (ms) | Min Response Time (ms) | Max Response Time (ms) | Requests/s | 50% | 95% |
10+
| :----------------- | ------------: | ------------------------: | -------------------------: | ---------------------: | ---------------------: | ---------: | ---: | ---: |
11+
| LG | 2099 | 3300 | 2718 | 81 | 4612 | 10.5 | 3300 | 4600 |
12+
| Toxicity Detection | 4547 | 450 | 796 | 19 | 10045 | 22.7 | 450 | 2500 |
13+
14+
This microservice is designed to detect toxicity, which is defined as rude, disrespectful, or unreasonable language likely to make someone leave a conversation. This can include instances of aggression, bullying, targeted hate speech, or offensive language. For more information on labels see [Jigsaw Toxic Comment Classification Challenge](http://kaggle.com/c/jigsaw-toxic-comment-classification-challenge).
15+
16+
## Environment Setup
17+
18+
### Clone OPEA GenAIComps and Setup Environment
19+
20+
Clone this repository at your desired location and set an environment variable for easy setup and usage throughout the instructions.
21+
22+
```bash
23+
git clone https://github.com/opea-project/GenAIComps.git
24+
25+
export OPEA_GENAICOMPS_ROOT=$(pwd)/GenAIComps
26+
```
27+
28+
Set the port that this service will use and the component name
29+
30+
```
31+
export TOXICITY_DETECTION_PORT=9090
32+
export TOXICITY_DETECTION_COMPONENT_NAME="OPEA_NATIVE_TOXICITY"
33+
```
34+
35+
By default, this microservice uses `OPEA_NATIVE_TOXICITY` which invokes [`Intel/toxic-prompt-roberta`](https://huggingface.co/Intel/toxic-prompt-roberta), locally.
36+
37+
Alternatively, if you are using Prediction Guard, reset the following component name environment variable:
38+
39+
```
40+
export TOXICITY_DETECTION_COMPONENT_NAME="PREDICTIONGUARD_TOXICITY_DETECTION"
41+
```
42+
43+
### Set environment variables
1044

1145
## 🚀1. Start Microservice with Python(Option 1)
1246

1347
### 1.1 Install Requirements
1448

1549
```bash
50+
cd $OPEA_GENAICOMPS_ROOT/comps/guardrails/src/toxicity_detection
1651
pip install -r requirements.txt
1752
```
1853

@@ -24,27 +59,42 @@ python toxicity_detection.py
2459

2560
## 🚀2. Start Microservice with Docker (Option 2)
2661

27-
### 2.1 Prepare toxicity detection model
62+
### 2.1 Build Docker Image
2863

29-
export HUGGINGFACEHUB_API_TOKEN=${HP_TOKEN}
64+
```bash
65+
cd $OPEA_GENAICOMPS_ROOT
66+
docker build \
67+
--build-arg https_proxy=$https_proxy \
68+
--build-arg http_proxy=$http_proxy \
69+
-t opea/guardrails-toxicity-detection:latest \
70+
-f comps/guardrails/src/toxicity_detection/Dockerfile .
71+
```
3072

31-
### 2.2 Build Docker Image
73+
### 2.2.a Run Docker with Compose (Option A)
3274

3375
```bash
34-
cd ../../../ # back to GenAIComps/ folder
35-
docker build -t opea/guardrails-toxicity-detection:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/guardrails/src/toxicity_detection/Dockerfile .
76+
cd $OPEA_GENAICOMPS_ROOT/comps/guardrails/deployment/docker_compose
77+
docker compose up -d guardrails-toxicity-detection-server
3678
```
3779

38-
### 2.3 Run Docker Container with Microservice
80+
### 2.2.b Run Docker with CLI (Option B)
3981

4082
```bash
41-
docker run -d --rm --runtime=runc --name="guardrails-toxicity-detection-endpoint" -p 9091:9091 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} opea/guardrails-toxicity-detection:latest
83+
docker run -d --rm \
84+
--name="guardrails-toxicity-detection-server" \
85+
--runtime=runc \
86+
-p ${TOXICITY_DETECTION_PORT}:9090 \
87+
--ipc=host \
88+
-e http_proxy=$http_proxy \
89+
-e https_proxy=$https_proxy \
90+
-e no_proxy=${no_proxy} \
91+
opea/guardrails-toxicity-detection:latest
4292
```
4393

4494
## 🚀3. Get Status of Microservice
4595

4696
```bash
47-
docker container logs -f guardrails-toxicity-detection-endpoint
97+
docker container logs -f guardrails-toxicity-detection-server
4898
```
4999

50100
## 🚀4. Consume Microservice Pre-LLM/Post-LLM
@@ -54,9 +104,9 @@ Once microservice starts, users can use examples (bash or python) below to apply
54104
**Bash:**
55105

56106
```bash
57-
curl localhost:9091/v1/toxicity
58-
-X POST
59-
-d '{"text":"How to poison my neighbor'\''s dog without being caught?"}'
107+
curl localhost:${TOXICITY_DETECTION_PORT}/v1/toxicity \
108+
-X POST \
109+
-d '{"text":"How to poison my neighbor'\''s dog without being caught?"}' \
60110
-H 'Content-Type: application/json'
61111
```
62112

@@ -71,9 +121,11 @@ Example Output:
71121
```python
72122
import requests
73123
import json
124+
import os
74125

126+
toxicity_detection_port = os.getenv("TOXICITY_DETECTION_PORT")
75127
proxies = {"http": ""}
76-
url = "http://localhost:9091/v1/toxicity"
128+
url = f"http://localhost:{toxicty_detection_port}/v1/toxicity"
77129
data = {"text": "How to poison my neighbor'''s dog without being caught?"}
78130

79131

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
import asyncio
5+
import os
6+
7+
from transformers import pipeline
8+
9+
from comps import CustomLogger, OpeaComponent, OpeaComponentRegistry, ServiceType, TextDoc
10+
11+
logger = CustomLogger("opea_toxicity_native")
12+
logflag = os.getenv("LOGFLAG", False)
13+
14+
15+
@OpeaComponentRegistry.register("OPEA_NATIVE_TOXICITY")
16+
class OpeaToxicityDetectionNative(OpeaComponent):
17+
"""A specialized toxicity detection component derived from OpeaComponent."""
18+
19+
def __init__(self, name: str, description: str, config: dict = None):
20+
super().__init__(name, ServiceType.GUARDRAIL.name.lower(), description, config)
21+
self.model = os.getenv("TOXICITY_DETECTION_MODEL", "Intel/toxic-prompt-roberta")
22+
self.toxicity_pipeline = pipeline("text-classification", model=self.model, tokenizer=self.model)
23+
health_status = self.check_health()
24+
if not health_status:
25+
logger.error("OpeaToxicityDetectionNative health check failed.")
26+
27+
async def invoke(self, input: TextDoc):
28+
"""Invokes the toxic detection for the input.
29+
30+
Args:
31+
input (Input TextDoc)
32+
"""
33+
toxic = await asyncio.to_thread(self.toxicity_pipeline, input.text)
34+
if toxic[0]["label"].lower() == "toxic":
35+
return TextDoc(text="Violated policies: toxicity, please check your input.", downstream_black_list=[".*"])
36+
else:
37+
return TextDoc(text=input.text)
38+
39+
def check_health(self) -> bool:
40+
"""Checks the health of the animation service.
41+
42+
Returns:
43+
bool: True if the service is reachable and healthy, False otherwise.
44+
"""
45+
if self.toxicity_pipeline:
46+
return True
47+
else:
48+
return False

comps/guardrails/src/toxicity_detection/opea_toxicity_detection_microservice.py

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,7 @@
33

44
import os
55
import time
6-
7-
from integrations.predictionguard import OpeaToxicityDetectionPredictionGuard
6+
from typing import Union
87

98
from comps import (
109
CustomLogger,
@@ -21,7 +20,17 @@
2120
logger = CustomLogger("opea_toxicity_detection_microservice")
2221
logflag = os.getenv("LOGFLAG", False)
2322

24-
toxicity_detection_component_name = os.getenv("TOXICITY_DETECTION_COMPONENT_NAME", "PREDICTIONGUARD_TOXICITY_DETECTION")
23+
toxicity_detection_port = int(os.getenv("TOXICITY_DETECTION_PORT", 9090))
24+
toxicity_detection_component_name = os.getenv("TOXICITY_DETECTION_COMPONENT_NAME", "OPEA_NATIVE_TOXICITY")
25+
26+
if toxicity_detection_component_name == "OPEA_NATIVE_TOXICITY":
27+
from integrations.toxicdetection import OpeaToxicityDetectionNative
28+
elif toxicity_detection_component_name == "PREDICTIONGUARD_TOXICITY_DETECTION":
29+
from integrations.predictionguard import OpeaToxicityDetectionPredictionGuard
30+
else:
31+
logger.error(f"Component name {toxicity_detection_component_name} is not recognized")
32+
exit(1)
33+
2534
# Initialize OpeaComponentLoader
2635
loader = OpeaComponentLoader(
2736
toxicity_detection_component_name,
@@ -35,12 +44,12 @@
3544
service_type=ServiceType.GUARDRAIL,
3645
endpoint="/v1/toxicity",
3746
host="0.0.0.0",
38-
port=9090,
47+
port=toxicity_detection_port,
3948
input_datatype=TextDoc,
40-
output_datatype=ScoreDoc,
49+
output_datatype=Union[TextDoc, ScoreDoc],
4150
)
4251
@register_statistics(names=["opea_service@toxicity_detection"])
43-
async def toxicity_guard(input: TextDoc) -> ScoreDoc:
52+
async def toxicity_guard(input: TextDoc) -> Union[TextDoc, ScoreDoc]:
4453
start = time.time()
4554

4655
# Log the input if logging is enabled

comps/llms/src/text-generation/README_bedrock.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
In order to start Bedrock service, you need to setup the following environment variables first.
1010

1111
```bash
12+
export AWS_REGION=${aws_region}
1213
export AWS_ACCESS_KEY_ID=${aws_access_key_id}
1314
export AWS_SECRET_ACCESS_KEY=${aws_secret_access_key}
1415
```
@@ -23,13 +24,13 @@ export AWS_SESSION_TOKEN=${aws_session_token}
2324

2425
```bash
2526
cd GenAIComps/
26-
docker build --no-cache -t opea/bedrock:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
27+
docker build --no-cache -t opea/llm-textgen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/src/text-generation/Dockerfile .
2728
```
2829

2930
## Run the Bedrock Microservice
3031

3132
```bash
32-
docker run -d --name bedrock -p 9009:9000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e LLM_COMPONENT_NAME="OpeaTextGenBedrock" -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN opea/bedrock:latest
33+
docker run -d --name bedrock -p 9009:9000 --ipc=host -e http_proxy=$http_proxy -e https_proxy=$https_proxy -e LLM_COMPONENT_NAME="OpeaTextGenBedrock" -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN -e BEDROCK_REGION=$AWS_REGION opea/llm-textgen:latest
3334
```
3435

3536
(You can remove `-e AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN` if you are not using an IAM Role)
@@ -42,6 +43,7 @@ curl http://${host_ip}:9009/v1/chat/completions \
4243
-d '{"model": "us.anthropic.claude-3-5-haiku-20241022-v1:0", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17}' \
4344
-H 'Content-Type: application/json'
4445

46+
# stream mode
4547
curl http://${host_ip}:9009/v1/chat/completions \
4648
-X POST \
4749
-d '{"model": "us.anthropic.claude-3-5-haiku-20241022-v1:0", "messages": [{"role": "user", "content": "What is Deep Learning?"}], "max_tokens":17, "stream": "true"}' \

comps/llms/src/text-generation/integrations/bedrock.py

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -87,13 +87,21 @@ async def invoke(self, input: ChatCompletionRequest):
8787
if logflag and len(inference_config) > 0:
8888
logger.info(f"[llm - chat] inference_config: {inference_config}")
8989

90-
# Parse messages from HuggingFace TGI format to bedrock messages format
91-
# tgi: [{role: "system" | "user", content: "text"}]
90+
# Parse messages to Bedrock format
91+
# tgi: "prompt" or [{role: "system" | "user", content: "text"}]
9292
# bedrock: [role: "assistant" | "user", content: {text: "content"}]
93-
messages = [
94-
{"role": "assistant" if i.get("role") == "system" else "user", "content": [{"text": i.get("content", "")}]}
95-
for i in input.messages
96-
]
93+
messages = None
94+
if isinstance(input.messages, str):
95+
messages = [{"role": "user", "content": [{"text": input.messages}]}]
96+
else:
97+
# Convert from list of HuggingFace TGI message objects
98+
messages = [
99+
{
100+
"role": "assistant" if i.get("role") == "system" else "user",
101+
"content": [{"text": i.get("content", "")}],
102+
}
103+
for i in input.messages
104+
]
97105

98106
# Bedrock requires that conversations start with a user prompt
99107
# TGI allows the first message to be an assistant prompt, defining assistant behavior

comps/retrievers/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This retriever microservice is a highly efficient search service designed for handling and retrieving embedding vectors. It operates by receiving an embedding vector as input and conducting a similarity search against vectors stored in a VectorDB database. Users must specify the VectorDB's URL and the index name, and the service searches within that index to find documents with the highest similarity to the input vector.
44

5-
The service primarily utilizes similarity measures in vector space to rapidly retrieve contentually similar documents. The vector-based retrieval approach is particularly suited for handling large datasets, offering fast and accurate search results that significantly enhance the efficiency and quality of information retrieval.
5+
The service primarily utilizes similarity measures in vector space to rapidly retrieve contextually similar documents. The vector-based retrieval approach is particularly suited for handling large datasets, offering fast and accurate search results that significantly enhance the efficiency and quality of information retrieval.
66

77
Overall, this microservice provides robust backend support for applications requiring efficient similarity searches, playing a vital role in scenarios such as recommendation systems, information retrieval, or any other context where precise measurement of document similarity is crucial.
88

comps/third_parties/tgi/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@ export MAX_TOTAL_TOKENS=2048
1818
Run tgi on xeon.
1919

2020
```bash
21-
cd deplopyment/docker_compose
21+
cd deployment/docker_compose
2222
docker compose -f compose.yaml up -d tgi-server
2323
```
2424

2525
Run tgi on gaudi.
2626

2727
```bash
28-
cd deplopyment/docker_compose
28+
cd deployment/docker_compose
2929
docker compose -f compose.yaml up -d tgi-gaudi-server
3030
```

comps/third_parties/vllm/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ bash ./launch_vllm_service.sh ${port_number} ${model_name}
4343
#### Launch vLLM service with docker compose
4444

4545
```bash
46-
cd deplopyment/docker_compose
46+
cd deployment/docker_compose
4747
docker compose -f compose.yaml up vllm-server -d
4848
```
4949

@@ -64,8 +64,8 @@ Set `hw_mode` to `hpu`.
6464
1. Option 1: Use docker compose for quick deploy
6565

6666
```bash
67-
cd deplopyment/docker_compose
68-
docker compose -f compose.yaml vllm-gaudi-server up -d
67+
cd deployment/docker_compose
68+
docker compose -f compose.yaml up vllm-gaudi-server -d
6969
```
7070

7171
2. Option 2: Use scripts to set parameters.

0 commit comments

Comments
 (0)