Skip to content

Commit 425b423

Browse files
authored
update rag_eval readme (#126)
Signed-off-by: Yingchun Guo <[email protected]>
1 parent 1d3a502 commit 425b423

File tree

1 file changed

+70
-3
lines changed

1 file changed

+70
-3
lines changed

evals/evaluation/rag_eval/README.md

Lines changed: 70 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,10 @@ For evaluating the accuracy of a RAG pipeline, we use 2 latest published dataset
4242

4343
### Environment
4444
```bash
45+
git clone https://github.com/opea-project/GenAIEval
46+
cd GenAIEval
4547
pip install -r requirements.txt
48+
pip install -e .
4649
```
4750

4851
## MultiHop (English dataset)
@@ -65,25 +68,59 @@ docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all
6568
### Prepare Dataset
6669
We use the evaluation dataset from [MultiHop-RAG](https://github.com/yixuantt/MultiHop-RAG) repo, use the below command to prepare the dataset.
6770
```bash
68-
cd examples
71+
cd evals/evaluation/rag_eval/examples
6972
git clone https://github.com/yixuantt/MultiHop-RAG.git
7073
```
7174

7275
### Evaluation
7376

7477
Use below command to run the evaluation, please note that for the first run, argument `--ingest_docs` should be added in the command to ingest the documents into the vector database, while for the subsequent run, this argument should be omitted. Set `--retrieval_metrics` to get retrieval related metrics (MRR@10/MAP@10/Hits@10/Hits@4). Set `--ragas_metrics` and `--llm_endpoint` to get end-to-end rag pipeline metrics (faithfulness/answer_relevancy/...), which are judged by LLMs. We set `--limits` is 100 as default, which means only 100 examples are evaluated by llm-as-judge as it is very time consuming.
7578

79+
If you are using docker compose to deploy RAG system, you can simply run the evaluation as following:
80+
```bash
81+
python eval_multihop.py --docs_path MultiHop-RAG/dataset/corpus.json --dataset_path MultiHop-RAG/dataset/MultiHopRAG.json --ingest_docs --retrieval_metrics --ragas_metrics --llm_endpoint http://{llm_as_judge_ip}:{llm_as_judge_port}/generate
82+
```
83+
84+
If you are using Kubernetes manifest/helm to deploy RAG system, you must specify more arguments as following:
7685
```bash
77-
python eval_multihop.py --docs_path MultiHop-RAG/dataset/corpus.json --dataset_path MultiHop-RAG/dataset/MultiHopRAG.json --ingest_docs --retrieval_metrics --ragas_metrics --llm_endpoint http://{your_ip}:{your_llm_port}/generate
86+
python eval_multihop.py --docs_path MultiHop-RAG/dataset/corpus.json --dataset_path MultiHop-RAG/dataset/MultiHopRAG.json --ingest_docs --retrieval_metrics --ragas_metrics --llm_endpoint http://{llm_as_judge_ip}:{llm_as_judge_port}/generate --database_endpoint http://{your_dataprep_ip}:{your_dataprep_port}/v1/dataprep --embedding_endpoint http://{your_embedding_ip}:{your_embedding_port}/v1/embeddings --tei_embedding_endpoint http://{your_tei_embedding_ip}:{your_tei_embedding_port} --retrieval_endpoint http://{your_retrieval_ip}:{your_retrieval_port}/v1/retrieval --service_url http://{your_chatqna_ip}:{your_chatqna_port}/v1/chatqna
87+
```
88+
89+
The default values for arguments are:
90+
|Argument|Default value|
91+
|--------|-------------|
92+
|service_url|http://localhost:8888/v1/chatqna|
93+
|database_endpoint|http://localhost:6007/v1/dataprep|
94+
|embedding_endpoint|http://localhost:6000/v1/embeddings|
95+
|tei_embedding_endpoint|http://localhost:8090|
96+
|retrieval_endpoint|http://localhost:7000/v1/retrieval|
97+
|reranking_endpoint|http://localhost:8000/v1/reranking|
98+
|output_dir|./output|
99+
|temperature|0.1|
100+
|max_new_tokens|1280|
101+
|chunk_size|256|
102+
|chunk_overlap|100|
103+
|search_type|similarity|
104+
|retrival_k|10|
105+
|fetch_k|20|
106+
|lambda_mult|0.5|
107+
|dataset_path|../data/split_merged.json|
108+
|docs_path|../data/80000_docs|
109+
|tasks|["question_answering"]|
110+
|limits|100|
111+
112+
You can check arguments details use below command:
113+
```bash
114+
python eval_multihop.py --help
78115
```
79116

80117
## CRUD (Chinese dataset)
81118
[CRUD-RAG](https://arxiv.org/abs/2401.17043) is a Chinese benchmark for RAG (Retrieval-Augmented Generation) system. This example utilize CRUD-RAG for evaluating the RAG system.
82119

83-
84120
### Prepare Dataset
85121
We use the evaluation dataset from [CRUD-RAG](https://github.com/IAAR-Shanghai/CRUD_RAG) repo, use the below command to prepare the dataset.
86122
```bash
123+
cd evals/evaluation/rag_eval
87124
git clone https://github.com/IAAR-Shanghai/CRUD_RAG
88125
mkdir data/
89126
cp CRUD_RAG/data/crud_split/split_merged.json data/
@@ -96,10 +133,40 @@ Please refer to this [guide](https://github.com/opea-project/GenAIExamples/blob/
96133

97134
### Evaluation
98135
Use below command to run the evaluation, please note that for the first run, argument `--ingest_docs` should be added in the command to ingest the documents into the vector database, while for the subsequent run, this argument should be omitted.
136+
137+
If you are using docker compose to deploy RAG system, you can simply run the evaluation as following:
99138
```bash
100139
cd examples
101140
python eval_crud.py --dataset_path ../data/split_merged.json --docs_path ../data/80000_docs --ingest_docs
102141
```
103142

143+
If you are using Kubernetes manifest/helm to deploy RAG system, you must specify more arguments as following:
144+
```bash
145+
cd examples
146+
python eval_crud.py --dataset_path ../data/split_merged.json --docs_path ../data/80000_docs --ingest_docs --database_endpoint http://{your_dataprep_ip}:{your_dataprep_port}/v1/dataprep --embedding_endpoint http://{your_embedding_ip}:{your_embedding_port}/v1/embeddings --retrieval_endpoint http://{your_retrieval_ip}:{your_retrieval_port}/v1/retrieval --service_url http://{your_chatqna_ip}:{your_chatqna_port}/v1/chatqna
147+
```
148+
149+
The default values for arguments are:
150+
|Argument|Default value|
151+
|--------|-------------|
152+
|service_url|http://localhost:8888/v1/chatqna|
153+
|database_endpoint|http://localhost:6007/v1/dataprep|
154+
|embedding_endpoint|http://localhost:6000/v1/embeddings|
155+
|retrieval_endpoint|http://localhost:7000/v1/retrieval|
156+
|reranking_endpoint|http://localhost:8000/v1/reranking|
157+
|output_dir|./output|
158+
|temperature|0.1|
159+
|max_new_tokens|1280|
160+
|chunk_size|256|
161+
|chunk_overlap|100|
162+
|dataset_path|../data/split_merged.json|
163+
|docs_path|../data/80000_docs|
164+
|tasks|["question_answering"]|
165+
166+
You can check arguments details use below command:
167+
```bash
168+
python eval_crud.py --help
169+
```
170+
104171
## Acknowledgements
105172
This example is mostly adapted from [MultiHop-RAG](https://github.com/yixuantt/MultiHop-RAG) and [CRUD-RAG](https://github.com/IAAR-Shanghai/CRUD_RAG) repo, we thank the authors for their great work!

0 commit comments

Comments
 (0)