Skip to content

Commit c5657c3

Browse files
daniel-de-leon-user293pre-commit-ci[bot]ashahba
authored
Add Toxicity Evaluation (#241)
* add toxicity_eval Signed-off-by: Daniel Deleon <[email protected]> * remove poetry.lock Signed-off-by: Daniel Deleon <[email protected]> * add unit tests (WIP) Signed-off-by: Daniel Deleon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unittests and rm poetry Signed-off-by: Daniel Deleon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix deprecated HF API Signed-off-by: Daniel Deleon <[email protected]> * fix args typo for gaudi config Signed-off-by: Daniel Deleon <[email protected]> * clean up README Signed-off-by: Daniel Deleon <[email protected]> * aurpc probabilites fix Signed-off-by: Daniel Deleon <[email protected]> --------- Signed-off-by: Daniel Deleon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <[email protected]>
1 parent 8986653 commit c5657c3

File tree

5 files changed

+547
-0
lines changed

5 files changed

+547
-0
lines changed
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# Toxicity Detection Accuracy
2+
3+
Toxicity detection plays a critical role in guarding the inputs and outputs of large language models (LLMs) to ensure safe, respectful, and responsible content. Given the widespread use of LLMs in applications like customer service, education, and social media, there's a significant risk that they could inadvertently produce or amplify harmful language if toxicity is not detected effectively.
4+
5+
To evaluate a target toxicity detection LLM, we use multiple datasets: BeaverTails, Jigsaw Unintended Bias, OpenAI Moderation, SurgeAI Toxicity, ToxicChat, ToxiGen, and XSTest. We also employ the most commonly used metrics in toxicity classification to provide a comprehensive assessment. Currently, the benchmark script supports benchmarking only one dataset at a time. Future work includes enabling benchmarking on multiple datasets simultaneously. The Gaudi 2 accelerator is deployed in the benchmark to address the high demand of the AI workload while balancing power efficiency.
6+
7+
- Supported Datasets
8+
- [BeaverTails](https://huggingface.co/datasets/PKU-Alignment/BeaverTails)
9+
- [Jigsaw Unintended Bias](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification)
10+
- [OpenAI Moderation](https://github.com/openai/moderation-api-release/tree/main)
11+
- [SurgeAI Toxicity](https://github.com/surge-ai/toxicity)
12+
- [ToxicChat](https://huggingface.co/datasets/lmsys/toxic-chat)
13+
- [ToxiGen](https://huggingface.co/datasets/toxigen/toxigen-data)
14+
- [XSTest](https://huggingface.co/datasets/walledai/XSTest)
15+
- More datasets to come...
16+
17+
- Supported Metrics
18+
- accuracy
19+
- auprc (area under precision recall curve)
20+
- auroc
21+
- f1
22+
- fpr (false positive rate)
23+
- precision
24+
- recall
25+
26+
## Get Started on Gaudi 2 Accelerator
27+
### Requirements
28+
If you are using an `hpu` device, then clone the `optimum-habana` and the `GenAIEval` repositories.
29+
```bash
30+
git clone https://github.com/huggingface/optimum-habana.git --depth=1
31+
git clone https://github.com/opea-project/GenAIEval --depth=1
32+
```
33+
34+
### Setup
35+
If you're running behind corporate proxy, run Gaudi Docker with additional proxies and volume mount.
36+
```bash
37+
DOCKER_RUN_ENVS="--env http_proxy=${http_proxy} --env HTTP_PROXY=${HTTP_PROXY} --env https_proxy=${https_proxy} --env HTTPS_PROXY=${HTTPS_PROXY} --env no_proxy=${no_proxy} --env NO_PROXY=${NO_PROXY}"
38+
39+
docker run --disable-content-trust ${DOCKER_RUN_ENVS} \
40+
-d --rm -it --name toxicity-detection-benchmark \
41+
-v ${PWD}:/workdir \
42+
--runtime=habana \
43+
-e HABANA_VISIBLE_DEVICES=all \
44+
-e OMPI_MCA_btl_vader_single_copy_mechanism=none \
45+
--cap-add=sys_nice \
46+
--net=host \
47+
--ipc=host \
48+
vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest
49+
```
50+
51+
### Evaluation
52+
#### Execute interactive container
53+
```bash
54+
docker exec -it toxicity-detection-benchmark bash
55+
```
56+
#### Navigate to `workdir` and install required packages
57+
```bash
58+
cd /workdir
59+
cd optimum-habana && pip install . && cd ../GenAIEval
60+
pip install -r requirements.txt
61+
pip install -e .
62+
```
63+
64+
In case of [Jigsaw Unintended Bias](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), [OpenAI Moderation](https://github.com/openai/moderation-api-release), and [Surge AI Toxicity](https://github.com/surge-ai/toxicity) datasets, make sure the datasets are downloaded and stored in current working directory.
65+
66+
#### Test the model and confirm the results are saved correctly
67+
Navigate to the toxicity evaluation directory:
68+
```bash
69+
cd evals/evaluation/toxicity_eval
70+
```
71+
72+
Replace `MODEL_PATH` and `DATASET` with the appropriate path for the model and the name of the dataset.
73+
```bash
74+
MODEL_PATH=Intel/toxic-prompt-roberta
75+
DATASET=tc
76+
python benchmark_classification_metrics.py -m ${MODEL_PATH} -d ${DATASET}
77+
cat results/${MODEL_PATH##*/}_${DATASET}_accuracy/metrics.json
78+
```
79+
80+
If you are using an `hpu` device, you can instantiate the Gaudi configuration by passing the `GAUDI_CONFIG_NAME` variable with the appropriate configuration name. The default value for the device name (`device`) is `hpu`.
81+
```bash
82+
MODEL_PATH=Intel/toxic-prompt-roberta
83+
DATASET=tc
84+
GAUDI_CONFIG_NAME=Habana/roberta-base
85+
DEVICE_NAME=hpu
86+
python benchmark_classification_metrics.py -m ${MODEL_PATH} -d ${DATASET} -g_config ${GAUDI_CONFIG_NAME} --device ${DEVICE_NAME}
87+
cat results/${MODEL_PATH##*/}_${DATASET}_accuracy/metrics.json
88+
```
89+
90+
For the Jigsaw Unintended Bias, OpenAI Moderation, and Surge AI Toxicity datasets, pass the path of the stored dataset path in place of `DATASET_PATH`
91+
```bash
92+
MODEL_PATH=Intel/toxic-prompt-roberta
93+
DATASET=jigsaw
94+
DATASET_PATH=/path/to/dataset
95+
python ./classification_metrics/scripts/benchmark_classification_metrics.py -m ${MODEL_PATH} -d ${DATASET} -p ${DATASET_PATH}
96+
cat results/${MODEL_PATH##*/}_${DATASET}_accuracy/metrics.json
97+
```
98+
99+
## Get Started on CPU
100+
101+
### Requirements
102+
* Linux system or WSL2 on Windows (validated on Ubuntu* 22.04/24.04 LTS)
103+
* Python >=3.10
104+
105+
### Installation
106+
Follow the GenAIEval installation steps provided in the repository's main [README](https://github.com/daniel-de-leon-user293/GenAIEval/tree/daniel/toxicity-eval?tab=readme-ov-file#installation).
107+
108+
### Evaluation
109+
Navigate to the toxicity evaluation directory:
110+
```bash
111+
cd evals/evaluation/toxicity_eval
112+
```
113+
114+
In case of [Jigsaw Unintended Bias](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification), [OpenAI Moderation](https://github.com/openai/moderation-api-release), and [Surge AI Toxicity](https://github.com/surge-ai/toxicity), make sure the datasets are downloaded and stored in current working directory.
115+
116+
Replace `MODEL_PATH` and `DATASET` with the appropriate path for the model and the name of the dataset. For running the script on cpu device, replace the variable `DEVICE_NAME` with `cpu`.
117+
```bash
118+
MODEL_PATH=Intel/toxic-prompt-roberta
119+
DATASET=tc
120+
DEVICE_NAME=cpu
121+
python benchmark_classification_metrics.py -m ${MODEL_PATH} -d ${DATASET} --device ${DEVICE_NAME}
122+
```
123+
You can find the evaluation results in the results folder:
124+
```bash
125+
cat results/${MODEL_PATH##*/}_${DATASET}_accuracy/metrics.json
126+
```
127+
For the Jigsaw Unintended Bias, OpenAI Moderation, and Surge AI Toxicity datasets, pass the path of the stored dataset path in place of `DATASET_PATH`
128+
129+
```bash
130+
MODEL_PATH=Intel/toxic-prompt-roberta
131+
DATASET=jigsaw
132+
DATASET_PATH=/path/to/dataset
133+
DEVICE_NAME=cpu
134+
python benchmark_classification_metrics.py -m ${MODEL_PATH} -d ${DATASET} -p ${DATASET_PATH} --device ${DEVICE_NAME}
135+
```
136+
You can find the evaluation results in the results folder:
137+
```bash
138+
cat results/${MODEL_PATH##*/}_${DATASET}_accuracy/metrics.json
139+
```
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# Copyright (C) 2025 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0

0 commit comments

Comments
 (0)