Add msmarco v2.1 trec rag by mam10eks · Pull Request #269 · allenai/ir_datasets

mam10eks · 2024-08-05T17:13:56Z

Dear all,

with a focus on the current TREC Rag task, this is an initial implementation for accessing documents via the corpus_iter and the document store for the document and the segmented document datasets of MS MARCO v2.1.

This is not completely finished (i.e., mainly documentation todos in the corresponding tickets are open), but as the deadline is soon, this might be useful for others even when the documentation is not yet completed.

seanmacavaney · 2024-08-11T20:42:38Z

Hey- can you try out the revision? I was getting errors running the tests before, so I refactored a bit. Now it's using the classes from v2 where possible.

mam10eks · 2024-08-12T15:27:38Z

Perfect, nice that we could re-use more code from v2 classes, thanks!

The changes look good, I started the tests, I think they will run for a while, and will report back when they still work. (but I think everything should work)

mam10eks · 2024-08-13T06:56:23Z

Hi, an update from the tests, they still run on my machine and also the other scripts that I had still work, so everything seems to be fine from my side.

seanmacavaney · 2025-05-09T14:53:51Z

Digging into this...

MattiWe and others added 3 commits July 10, 2024 16:05

add msmarco v2.1 documents

4ac934f

Add unit tests for msmarco document 2.1

9044fe4

prepare integration of msmarco v2.1 segmented documents

ad48faa

mam10eks mentioned this pull request Aug 5, 2024

MS MARCO v2.1 and v2.1 segmented for TREC 2024 RAG #267

Open

8 tasks

mam10eks and others added 8 commits August 5, 2024 22:44

add doc counts

973751c

add extraction of msmarco v2.1 docs

ae3e778

add trec 2024 rag queries

8d1fbb8

fix error in configuration

05eb55b

wip by making use of v2 classes

f548ead

more wip

1dbf781

metadata

966c1ba

msmarco-document-v2.1 metadata

21790f2

seanmacavaney added 2 commits August 12, 2024 08:37

missing metadata from trec-cast

96d3c9f

more metadata

24a983d

Merge branch 'master' into add-msmarco-v2.1-trec-rag

bd018b7

add rag 2024 qrels

1bde305

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add msmarco v2.1 trec rag#269

Add msmarco v2.1 trec rag#269
mam10eks wants to merge 15 commits intoallenai:masterfrom
MattiWe:add-msmarco-v2.1-trec-rag

mam10eks commented Aug 5, 2024

Uh oh!

seanmacavaney commented Aug 11, 2024

Uh oh!

mam10eks commented Aug 12, 2024

Uh oh!

mam10eks commented Aug 13, 2024

Uh oh!

seanmacavaney commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mam10eks commented Aug 5, 2024

Uh oh!

seanmacavaney commented Aug 11, 2024

Uh oh!

mam10eks commented Aug 12, 2024

Uh oh!

mam10eks commented Aug 13, 2024

Uh oh!

seanmacavaney commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants