-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add VectorRAG #10229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add VectorRAG #10229
Conversation
Codecov ReportAttention: Patch coverage is
❌ Your patch status has failed because the patch coverage (34.04%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## txt2kg-v2 #10229 +/- ##
============================================
Coverage ? 84.40%
============================================
Files ? 500
Lines ? 34502
Branches ? 0
============================================
Hits ? 29121
Misses ? 5381
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than this minor thing
examples/llm/tech_qa.py
Outdated
@@ -285,8 +290,7 @@ def make_dataset(args): | |||
data=(fs, gs), seed_nodes_kwargs={"k_nodes": knn_neighsample_bs}, | |||
sampler_kwargs={"num_neighbors": [fanout] * num_hops}, | |||
local_filter=make_pcst_filter(triples, model), | |||
local_filter_kwargs=local_filter_kwargs, raw_docs=context_docs, | |||
embedded_docs=embedded_docs, k_for_docs=args.k_for_docs) | |||
local_filter_kwargs=local_filter_kwargs, vector_rag=vector_rag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call this document_retriever instead of vector_rag since technically vector_rag would be vector retrieval augmented generation but this is just the retriever part, the generator is seperated in our pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, i'd go for vector_retriever
and i'd replace local_filter
with graph_retriever
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like vector_retriever but with local filter as is graph_retriever may be confusing since its only the 2nd part of the graph_retriever. maybe we rename local_filter to subgraph_filter, and rename seed_nodes_kwargs and sampler_kwargs to a total a single dict called "graph_sampler_kwargs" (and make the necesary changes under the hood to allow this, or just leave the as graph_* if that adds unnecesary complication to your effort). maybe then we would also wana change data -> graph_data. thoughts? this is all very open to discussion, just spitballing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, i like subgraph_filter
, I don't have strong opinions when it comes to naming those
yeah graph_data
sg as well.. tbh PyG calling graphs Data
/data
is so confusing
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
72c5a8a
to
d6e991b
Compare
This PR decouples VectorRAG in a different class, offering with interface for non-graph based RAG + add DocumentRetriever VectorRAG.