You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use the evaluation dataset from [MultiHop-RAG](https://github.com/yixuantt/MultiHop-RAG) repo, use the below command to prepare the dataset.
Use below command to run the evaluation, please note that for the first run, argument `--ingest_docs` should be added in the command to ingest the documents into the vector database, while for the subsequent run, this argument should be omitted. Set `--retrieval_metrics` to get retrieval related metrics (MRR@10/MAP@10/Hits@10/Hits@4). Set `--ragas_metrics` and `--llm_endpoint` to get end-to-end rag pipeline metrics (faithfulness/answer_relevancy/...), which are judged by LLMs. We set `--limits` is 100 as default, which means only 100 examples are evaluated by llm-as-judge as it is very time consuming.
75
78
79
+
If you are using docker compose to deploy RAG system, you can simply run the evaluation as following:
You can check arguments details use below command:
113
+
```bash
114
+
python eval_multihop.py --help
78
115
```
79
116
80
117
## CRUD (Chinese dataset)
81
118
[CRUD-RAG](https://arxiv.org/abs/2401.17043) is a Chinese benchmark for RAG (Retrieval-Augmented Generation) system. This example utilize CRUD-RAG for evaluating the RAG system.
82
119
83
-
84
120
### Prepare Dataset
85
121
We use the evaluation dataset from [CRUD-RAG](https://github.com/IAAR-Shanghai/CRUD_RAG) repo, use the below command to prepare the dataset.
@@ -96,10 +133,40 @@ Please refer to this [guide](https://github.com/opea-project/GenAIExamples/blob/
96
133
97
134
### Evaluation
98
135
Use below command to run the evaluation, please note that for the first run, argument `--ingest_docs` should be added in the command to ingest the documents into the vector database, while for the subsequent run, this argument should be omitted.
136
+
137
+
If you are using docker compose to deploy RAG system, you can simply run the evaluation as following:
You can check arguments details use below command:
167
+
```bash
168
+
python eval_crud.py --help
169
+
```
170
+
104
171
## Acknowledgements
105
172
This example is mostly adapted from [MultiHop-RAG](https://github.com/yixuantt/MultiHop-RAG) and [CRUD-RAG](https://github.com/IAAR-Shanghai/CRUD_RAG) repo, we thank the authors for their great work!
0 commit comments