Skip to content

Add VectorRAG #10229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 29, 2025
Merged

Add VectorRAG #10229

merged 6 commits into from
Apr 29, 2025

Conversation

Kh4L
Copy link
Contributor

@Kh4L Kh4L commented Apr 25, 2025

This PR decouples VectorRAG in a different class, offering with interface for non-graph based RAG + add DocumentRetriever VectorRAG.

Copy link

codecov bot commented Apr 25, 2025

Codecov Report

Attention: Patch coverage is 34.04255% with 31 lines in your changes missing coverage. Please review.

Please upload report for BASE (txt2kg-v2@270c89e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
torch_geometric/utils/rag/vectorrag.py 43.33% 17 Missing ⚠️
torch_geometric/loader/rag_loader.py 18.75% 13 Missing ⚠️
torch_geometric/nn/nlp/txt2kg.py 0.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (34.04%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff              @@
##             txt2kg-v2   #10229   +/-   ##
============================================
  Coverage             ?   84.40%           
============================================
  Files                ?      500           
  Lines                ?    34502           
  Branches             ?        0           
============================================
  Hits                 ?    29121           
  Misses               ?     5381           
  Partials             ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than this minor thing

@@ -285,8 +290,7 @@ def make_dataset(args):
data=(fs, gs), seed_nodes_kwargs={"k_nodes": knn_neighsample_bs},
sampler_kwargs={"num_neighbors": [fanout] * num_hops},
local_filter=make_pcst_filter(triples, model),
local_filter_kwargs=local_filter_kwargs, raw_docs=context_docs,
embedded_docs=embedded_docs, k_for_docs=args.k_for_docs)
local_filter_kwargs=local_filter_kwargs, vector_rag=vector_rag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this document_retriever instead of vector_rag since technically vector_rag would be vector retrieval augmented generation but this is just the retriever part, the generator is seperated in our pipeline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, i'd go for vector_retriever and i'd replace local_filter with graph_retriever

wdyt?

Copy link
Contributor

@puririshi98 puririshi98 Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like vector_retriever but with local filter as is graph_retriever may be confusing since its only the 2nd part of the graph_retriever. maybe we rename local_filter to subgraph_filter, and rename seed_nodes_kwargs and sampler_kwargs to a total a single dict called "graph_sampler_kwargs" (and make the necesary changes under the hood to allow this, or just leave the as graph_* if that adds unnecesary complication to your effort). maybe then we would also wana change data -> graph_data. thoughts? this is all very open to discussion, just spitballing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i like subgraph_filter, I don't have strong opinions when it comes to naming those

yeah graph_data sg as well.. tbh PyG calling graphs Data/data is so confusing

@Kh4L Kh4L marked this pull request as ready for review April 29, 2025 00:23
@Kh4L Kh4L requested review from wsad1, mananshah99, a team and EdisonLeeeee as code owners April 29, 2025 00:24
@Kh4L Kh4L requested a review from puririshi98 April 29, 2025 00:24
@puririshi98 puririshi98 changed the base branch from latest-txt2kg to txt2kg-v2 April 29, 2025 00:40
@Kh4L Kh4L force-pushed the txt2kg_decouple_vectorrag branch from 72c5a8a to d6e991b Compare April 29, 2025 01:00
@puririshi98 puririshi98 merged commit d36c799 into pyg-team:txt2kg-v2 Apr 29, 2025
14 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants