JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and Job Descriptions

📊 Overview

JobResQA is a multilingual Question Answering benchmark for evaluating LLM capabilities on HR-specific tasks. The dataset contains 581 QA pairs across 105 synthetic résumé-job description pairs in 5 languages (en, es, it, de, zh), with three complexity levels from basic extraction to cross-document reasoning.

Key Features:

Multilingual: Parallel data in 5 languages (data/)
Privacy-Preserving: Synthetic data with anonymization (resources/placeholders/)
Three Complexity Levels: Basic (26.5%), Intermediate (36.7%), Complex (36.8%)
Fairness-Aware: Controlled demographic attributes for bias analysis

📁 Dataset

📄 Files

The benchmark consists of 5 language-specific TSV files (data/):

jobresqa.en.tsv - English
jobresqa.de.tsv - German
jobresqa.es.tsv - Spanish
jobresqa.it.tsv - Italian
jobresqa.zh.tsv - Chinese

🔧 Format

Each TSV file contains: example_id, resume_id, resume, jd_id, jd, question, short_answer, explanation, notes, complexity_level, language

Anonymization: All personal information uses placeholders like [NAME], [EMAIL], [PHONE], [COMPANY], etc. See resources/placeholders/ for the complete list.

💻 Load JobResQA Dataset

import pandas as pd
df = pd.read_csv('data/jobresqa.en.tsv', sep='\t')

📂 Repository Structure

data/ - Benchmark dataset (5 language TSV files)
resources/ - Prompts and resources
- prompts/ - LLM prompts for QA, generation, and translation
- placeholders/ - Anonymization placeholders
- mqm_annotation/ - Translation quality metrics
scripts/ - Example scripts for QA, evaluation, generation, and translation
src/ - Source code

🚀 Quick Start

⚙️ Installation

git clone https://github.com/yourusername/jobresqa-benchmark.git
cd jobresqa-benchmark
bash install.sh
cp .env.example .env  # Add your API keys

Required environment variables:

OPENAI_API_KEY - OpenAI API key
REPO_DIR - Path to this repository

📖 Usage

The scripts/ directory contains example scripts:

run_qa.py - Question answering
run_eval_qa.py - Evaluate answers using G-Eval
run_resume_synthetic_generation.py - Generate synthetic résumés
run_JD_synthetic_generation.py - Generate job descriptions
run_translation.py - TEaR translation framework

python scripts/run_qa.py

🔗 Resources

💬 Prompts

resources/prompts/ contains LLM prompts:

qa/ - Question answering and evaluation
resume_jd_generation/ - Synthetic data generation
tear_human_in_the_loop/ - Translation framework (TEaR)

🏷️ Placeholders

resources/placeholders/ contains anonymization placeholders:

placeholders.{lang}.txt - Language-specific lists
placeholders_translations_dictionary.json - Cross-language translations

✅ MQM Annotation

resources/mqm_annotation/ contains translation quality metrics:

mqm_error_categories.txt - Error taxonomy
mqm_human_translations.{lang_pair}.txt - Human translation examples
mqm_human_errors.{lang_pair}.txt - Annotated errors

Citation

This work is available at arXiv as a preprint with the title JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and Job Descriptions.

If you use this benchmark, please cite the following paper:

@misc{carrino2026jobresqabenchmarkllmmachine,
      title={JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual R\'esum\'es and JDs}, 
      author={Casimiro Pio Carrino and Paula Estrella and Rabih Zbib and Carlos Escolano and José A. R. Fonollosa},
      year={2026},
      eprint={2601.23183},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.23183}, 
}

📄 License

📧 Contact

For questions, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
resources		resources
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and Job Descriptions

📊 Overview

📁 Dataset

📄 Files

🔧 Format

💻 Load JobResQA Dataset

📂 Repository Structure

🚀 Quick Start

⚙️ Installation

📖 Usage

🔗 Resources

💬 Prompts

🏷️ Placeholders

✅ MQM Annotation

Citation

📄 License

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JobResQA: A Benchmark for LLM Machine Reading Comprehension on Multilingual Résumés and Job Descriptions

📊 Overview

📁 Dataset

📄 Files

🔧 Format

💻 Load JobResQA Dataset

📂 Repository Structure

🚀 Quick Start

⚙️ Installation

📖 Usage

🔗 Resources

💬 Prompts

🏷️ Placeholders

✅ MQM Annotation

Citation

📄 License

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages