Skip to content

Avature/jobresqa-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and Job Descriptions

License: CC BY-SA 2.0 Python 3.10+

📊 Overview

JobResQA is a multilingual Question Answering benchmark for evaluating LLM capabilities on HR-specific tasks. The dataset contains 581 QA pairs across 105 synthetic résumé-job description pairs in 5 languages (en, es, it, de, zh), with three complexity levels from basic extraction to cross-document reasoning.

Key Features:

  • Multilingual: Parallel data in 5 languages (data/)
  • Privacy-Preserving: Synthetic data with anonymization (resources/placeholders/)
  • Three Complexity Levels: Basic (26.5%), Intermediate (36.7%), Complex (36.8%)
  • Fairness-Aware: Controlled demographic attributes for bias analysis

📁 Dataset

📄 Files

The benchmark consists of 5 language-specific TSV files (data/):

🔧 Format

Each TSV file contains: example_id, resume_id, resume, jd_id, jd, question, short_answer, explanation, notes, complexity_level, language

Anonymization: All personal information uses placeholders like [NAME], [EMAIL], [PHONE], [COMPANY], etc. See resources/placeholders/ for the complete list.

💻 Load JobResQA Dataset

import pandas as pd
df = pd.read_csv('data/jobresqa.en.tsv', sep='\t')

📂 Repository Structure

  • data/ - Benchmark dataset (5 language TSV files)
  • resources/ - Prompts and resources
  • scripts/ - Example scripts for QA, evaluation, generation, and translation
  • src/ - Source code

🚀 Quick Start

⚙️ Installation

git clone https://github.com/yourusername/jobresqa-benchmark.git
cd jobresqa-benchmark
bash install.sh
cp .env.example .env  # Add your API keys

Required environment variables:

  • OPENAI_API_KEY - OpenAI API key
  • REPO_DIR - Path to this repository

📖 Usage

The scripts/ directory contains example scripts:

python scripts/run_qa.py

🔗 Resources

💬 Prompts

resources/prompts/ contains LLM prompts:

🏷️ Placeholders

resources/placeholders/ contains anonymization placeholders:

  • placeholders.{lang}.txt - Language-specific lists
  • placeholders_translations_dictionary.json - Cross-language translations

✅ MQM Annotation

resources/mqm_annotation/ contains translation quality metrics:

  • mqm_error_categories.txt - Error taxonomy
  • mqm_human_translations.{lang_pair}.txt - Human translation examples
  • mqm_human_errors.{lang_pair}.txt - Annotated errors

Citation

This work is available at arXiv as a preprint with the title JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and Job Descriptions.

If you use this benchmark, please cite the following paper:

@misc{carrino2026jobresqabenchmarkllmmachine,
      title={JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual R\'esum\'es and JDs}, 
      author={Casimiro Pio Carrino and Paula Estrella and Rabih Zbib and Carlos Escolano and José A. R. Fonollosa},
      year={2026},
      eprint={2601.23183},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.23183}, 
}

📄 License

Licensed under CC BY-SA 2.0. Copyright © 2025 Avature.

📧 Contact

For questions, please open an issue on GitHub.

About

JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and JDs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors