G-retriever API updates (NVTX, Remote Backend, Large Graph Indexer, Examples) #9666

zaristei · 2024-09-16T22:27:07Z

Follow up to PR 9597. Includes multiple changes related to LLM+GNN experiments and scaling up to a remote backend. Including:

LargeGraphIndexer for building a large knowledge graph locally from multiple samples in an arbitrary dataset
Remote Backend Loader and examples for deploying a Retrieval algorithm to a third party backend FeatureStore or GraphStore
NVTX profiling tools for nsys users
Quality of Life improvements and benchmarking scripts for G-Retriever.

Updates using these for WebQSP will be moved to a seperate PR

UPDATE:
PR is being broken up into smaller PRs. These can be previewed here:

for more information, see https://pre-commit.ci

docs/source/advanced/rag.rst

test/data/test_large_graph_indexer.py

puririshi98

just make new PR w/ webqsp changes and move advanced/rag.rst to that. and apply the fix for skipping pytest if not correct versioning.

test/data/test_large_graph_indexer.py

for more information, see https://pre-commit.ci

Will be re-added with an updated version of WebQSP

zaristei · 2024-11-25T20:08:14Z

just make new PR w/ webqsp changes and move advanced/rag.rst to that. and apply the fix for skipping pytest if not correct versioning.

New PR to be found here for WebQSP and doc changes

puririshi98

LGTM. Has been reviewed 10+ times over the last few months. Code tested. @zaristei has addressed all reviews. Most of the reviews are much above

…xamples) (pyg-team#9666) Follow up to [PR 9597](pyg-team#9597). Includes multiple changes related to LLM+GNN experiments and scaling up to a remote backend. Including: - LargeGraphIndexer for building a large knowledge graph locally from multiple samples in an arbitrary dataset - Remote Backend Loader and examples for deploying a Retrieval algorithm to a third party backend FeatureStore or GraphStore - NVTX profiling tools for nsys users - Quality of Life improvements and benchmarking scripts for G-Retriever. Updates using these for WebQSP will be moved to a seperate PR UPDATE: PR is being broken up into smaller PRs. These can be previewed here: - zaristei#6 - zaristei#7 - zaristei#8 --------- Co-authored-by: Zack Aristei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Zachary Aristei <[email protected]> Co-authored-by: Rishi Puri <[email protected]>

Successor to [9666](#9666), this: - ~~updates the documentation to show how to utilize GNN RAG and~~(now handled by separate branch) - updates WebQSP to help serve as a toy example for LargeGraphIndexer. - fixes issues with LargeGraphIndexer running out of memory by introducing a default batch size and multithreading ability ~~currently blocked by a bug that causes the g_retriever.py example to get 1% less accuracy.~~ Bug is due to a fp32 precision issue related to batch kernels in Huggingface's transformers. Performance difference is too inconsequential to require a fix. may also be the cause of low retrieval precision in #9846 --------- Co-authored-by: Zack Aristei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Zachary Aristei <[email protected]> Co-authored-by: Rishi Puri <[email protected]> Co-authored-by: Rishi Puri <[email protected]>

Successor to [9666](pyg-team#9666), this: - ~~updates the documentation to show how to utilize GNN RAG and~~(now handled by separate branch) - updates WebQSP to help serve as a toy example for LargeGraphIndexer. - fixes issues with LargeGraphIndexer running out of memory by introducing a default batch size and multithreading ability ~~currently blocked by a bug that causes the g_retriever.py example to get 1% less accuracy.~~ Bug is due to a fp32 precision issue related to batch kernels in Huggingface's transformers. Performance difference is too inconsequential to require a fix. may also be the cause of low retrieval precision in pyg-team#9846 --------- Co-authored-by: Zack Aristei <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Zachary Aristei <[email protected]> Co-authored-by: Rishi Puri <[email protected]> Co-authored-by: Rishi Puri <[email protected]>

Zack Aristei and others added 30 commits August 28, 2024 15:03

finished dataloader

b912652

finished dataloader 2

ce5370b

add to data transform

dc3f9f4

init commit of dataset

4488d70

debug pcst part 1

eb9b761

push what i have for now

85da0e7

fixes after rebase

168a37a

more fixes and mocking

2cf727f

more mocking

9ab9b22

main mocking working

6a1ee0f

pr fixes

03f6f4e

migrate large graph indexer

0b4ee3d

fix save overrider

81a1b8a

formatting 1

dd2ec56

formatting 2

917fe54

start unittests

ae8d057

tests done for largegraphindexer

7fdae85

tests done for largegraphindexer 2

5f67a84

tests done for largegraphindexer 3

b0ab66f

tests done for largegraphindexer 4

bfd1a2f

migrate updated qsp dataset

bcee67a

formatting

2117605

add dataset

866866c

add option for new dataloader

70f0380

fix formatting

db94397

instantiate ds in train func

a03028a

Restore mapping attrs

0ec7aac

Test edited to cover restoration of data info

c1e5cc8

begin trying profiling

5a09881

speedup retrieval to avoid timeouts

9d1093c

zaristei and others added 12 commits November 23, 2024 09:46

col limits on prints 2

14ae2fb

[pre-commit.ci] auto fixes from pre-commit.com hooks

c1380ba

for more information, see https://pre-commit.ci

col limits on prints 3

14e0376

[pre-commit.ci] auto fixes from pre-commit.com hooks

204ddcb

for more information, see https://pre-commit.ci

col limits on prints 4

0fa20ad

[pre-commit.ci] auto fixes from pre-commit.com hooks

8f63e66

for more information, see https://pre-commit.ci

col limits on prints 5

51129d0

[pre-commit.ci] auto fixes from pre-commit.com hooks

79c43b1

for more information, see https://pre-commit.ci

col limits on prints 6

278cada

pre-commit fix

581e184

Merge branch 'master' into zaristei/g_retriever_experiments

257319a

[pre-commit.ci] auto fixes from pre-commit.com hooks

a7cfc61

for more information, see https://pre-commit.ci

zaristei commented Nov 25, 2024

View reviewed changes

puririshi98 changed the title ~~G Retriever Experiments and Improvements (full)~~ G-retriever API updates (NVTX, Remote Backend, Large Graph Indexer, Examples) Nov 25, 2024

puririshi98 requested changes Nov 25, 2024

View reviewed changes

test/data/test_large_graph_indexer.py Outdated Show resolved Hide resolved

zaristei and others added 8 commits November 25, 2024 19:14

large graph indexer test skip with wrong version

0e69cd0

[pre-commit.ci] auto fixes from pre-commit.com hooks

794e487

for more information, see https://pre-commit.ci

Remove outdated docs 1

f0ffc14

Will be re-added with an updated version of WebQSP

Remove outdated docs 2

147fc84

Will be re-added with an updated version of WebQSP

Remove outdated docs 3

2140250

Will be re-added with an updated version of WebQSP

Remove outdated docs 4

32ad70a

Will be re-added with an updated version of WebQSP

Remove outdated docs 5

13301f0

Will be re-added with an updated version of WebQSP

lint fix

b5069e2

zaristei mentioned this pull request Nov 25, 2024

Large Graph Indexer WebQSP Refactor #9806

Merged

puririshi98 approved these changes Nov 26, 2024

View reviewed changes

puririshi98 merged commit 742f790 into pyg-team:master Nov 26, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

G-retriever API updates (NVTX, Remote Backend, Large Graph Indexer, Examples) #9666

G-retriever API updates (NVTX, Remote Backend, Large Graph Indexer, Examples) #9666

Uh oh!

zaristei commented Sep 16, 2024 •

edited by puririshi98

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

puririshi98 left a comment

Uh oh!

Uh oh!

zaristei commented Nov 25, 2024 •

edited

Loading

Uh oh!

puririshi98 left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

G-retriever API updates (NVTX, Remote Backend, Large Graph Indexer, Examples) #9666

G-retriever API updates (NVTX, Remote Backend, Large Graph Indexer, Examples) #9666

Uh oh!

Conversation

zaristei commented Sep 16, 2024 • edited by puririshi98 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

puririshi98 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zaristei commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

puririshi98 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zaristei commented Sep 16, 2024 •

edited by puririshi98

Loading

zaristei commented Nov 25, 2024 •

edited

Loading

puririshi98 left a comment •

edited

Loading