- Annotation Files
- Contains human annotation for micro and macro evaluation.
- LLM Responses
- This folder contains responses generated by three LLMs (i.e., PalM2, GPT-3.5, and Llama2) for four prompt levels. Here, LLM1, LLM2, and LLM3 represents
PalM2,GPT-3.5, andLlama2. The responses are in the columnsres_prompt1,res_prompt2,res_prompt3, andres_prompt4.
- This folder contains responses generated by three LLMs (i.e., PalM2, GPT-3.5, and Llama2) for four prompt levels. Here, LLM1, LLM2, and LLM3 represents
- GPT Ratings
- Contains the ratings given by
GPT-4oon the LLM-generated responses. The human and GPT ratings are merged together in the files.GPT_rating_LLMxandMacro_GPT_scorecontains micro and macro evaluation ratings of both human andGPT-4orespectively. corr_average_likert_ceilfile contains the correlations between human and GPT-4o in micro-evaluation for each LLM.
- Contains the ratings given by
- Meta-Review Data
- The curated dataset used for the meta-review generation.
Review1,Review2, andReview3columns have the peer reviews andMeta_Reviewcolumn contain the meta-reviews written by humans.
- The curated dataset used for the meta-review generation.
If you find our work useful for your research then cite using this BibTeX:
@article{hossain2024llms,
title={LLMs as Meta-Reviewers' Assistants: A Case Study},
author={Hossain, Eftekhar and Sinha, Sanjeev Kumar and Bansal, Naman and Knipper, Alex and Sarkar, Souvika and Salvador, John and Mahajan, Yash and Guttikonda, Sri and Akter, Mousumi and Mahadi Hassan, Md and others},
journal={arXiv e-prints},
pages={arXiv--2402},
year={2024}
}