Skip to content

Commit 102fcdd

Browse files
lkk12014402rootpre-commit-ci[bot]
authored
update llm-as-judge doc. (#114)
* update llm-as-judge doc. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: root <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 6abbe40 commit 102fcdd

File tree

2 files changed

+13
-3
lines changed

2 files changed

+13
-3
lines changed

evals/evaluation/rag_eval/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ To setup a LLM model, we can use [tgi-gaudi](https://github.com/huggingface/tgi-
5959
```
6060
# please set your llm_port and hf_token
6161
62-
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.1 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 1024 --max-total-tokens 2048 --sharded true --num-shard 2
62+
docker run -p {your_llm_port}:80 --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e PT_HPU_ENABLE_LAZY_COLLECTIVES=true -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e HF_TOKEN={your_hf_token} --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:2.0.1 --model-id mistralai/Mixtral-8x7B-Instruct-v0.1 --max-input-tokens 2048 --max-total-tokens 4096 --sharded true --num-shard 2
6363
```
6464

6565
### Prepare Dataset
@@ -71,7 +71,7 @@ git clone https://github.com/yixuantt/MultiHop-RAG.git
7171

7272
### Evaluation
7373

74-
Use below command to run the evaluation, please note that for the first run, argument `--ingest_docs` should be added in the command to ingest the documents into the vector database, while for the subsequent run, this argument should be omitted. Set `--retrieval_metrics` to get retrieval related metrics (MRR@10/MAP@10/Hits@10/Hits@4). Set `--ragas_metrics` and `--llm_endpoint` to get end-to-end rag pipeline metrics (faithfulness/answer_relevancy/...), which are judged by LLMs.
74+
Use below command to run the evaluation, please note that for the first run, argument `--ingest_docs` should be added in the command to ingest the documents into the vector database, while for the subsequent run, this argument should be omitted. Set `--retrieval_metrics` to get retrieval related metrics (MRR@10/MAP@10/Hits@10/Hits@4). Set `--ragas_metrics` and `--llm_endpoint` to get end-to-end rag pipeline metrics (faithfulness/answer_relevancy/...), which are judged by LLMs. We set `--limits` is 100 as default, which means only 100 examples are evaluated by llm-as-judge as it is very time consuming.
7575

7676
```bash
7777
python eval_multihop.py --docs_path MultiHop-RAG/dataset/corpus.json --dataset_path MultiHop-RAG/dataset/MultiHopRAG.json --ingest_docs --retrieval_metrics --ragas_metrics --llm_endpoint http://{your_ip}:{your_llm_port}/generate

evals/evaluation/rag_eval/examples/eval_multihop.py

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ def evaluate(self, all_queries, arguments):
139139
def get_ragas_metrics(self, all_queries, arguments):
140140
from langchain_huggingface import HuggingFaceEndpointEmbeddings
141141

142-
embeddings = HuggingFaceEndpointEmbeddings(model=arguments.embedding_endpoint)
142+
embeddings = HuggingFaceEndpointEmbeddings(model=arguments.tei_embedding_endpoint)
143143

144144
metric = RagasMetric(threshold=0.5, model=arguments.llm_endpoint, embeddings=embeddings)
145145
all_answer_relevancy = 0
@@ -163,6 +163,9 @@ def get_ragas_metrics(self, all_queries, arguments):
163163
ragas_inputs["ground_truth"].append(data["answer"])
164164
ragas_inputs["contexts"].append(retrieved_documents[:3])
165165

166+
if len(ragas_inputs["question"]) >= arguments.limits:
167+
break
168+
166169
ragas_metrics = metric.measure(ragas_inputs)
167170
return ragas_metrics
168171

@@ -208,12 +211,19 @@ def args_parser():
208211
parser.add_argument("--ingest_docs", action="store_true", help="Whether to ingest documents to vector database")
209212
parser.add_argument("--retrieval_metrics", action="store_true", help="Whether to compute retrieval metrics.")
210213
parser.add_argument("--ragas_metrics", action="store_true", help="Whether to compute ragas metrics.")
214+
parser.add_argument("--limits", type=int, default=100, help="Number of examples to be evaluated by llm-as-judge")
211215
parser.add_argument(
212216
"--database_endpoint", type=str, default="http://localhost:6007/v1/dataprep", help="Service URL address."
213217
)
214218
parser.add_argument(
215219
"--embedding_endpoint", type=str, default="http://localhost:6000/v1/embeddings", help="Service URL address."
216220
)
221+
parser.add_argument(
222+
"--tei_embedding_endpoint",
223+
type=str,
224+
default="http://localhost:8090",
225+
help="Service URL address of tei embedding.",
226+
)
217227
parser.add_argument(
218228
"--retrieval_endpoint", type=str, default="http://localhost:7000/v1/retrieval", help="Service URL address."
219229
)

0 commit comments

Comments
 (0)