Reorganize TXT2KG and introduce `torch_geometric.llm` #10436

puririshi98 · 2025-09-04T18:13:28Z

replaces #9992
adds new torch_Geometric.llm for NVIDIA to mantain
example of how to use this in nvidia docker container:
git config --global credential.helper store; huggingface-cli login --token <insert_token>; cd /opt/pyg; pip uninstall -y torch-geometric; rm -rf pytorch_geometric; git clone -b latest-txt2kg https://github.com/pyg-team/pytorch_geometric.git; cd /opt/pyg/pytorch_geometric; pip install .; pip install openai

example to run:
python3 examples/llm/txt2kg_rag.py

thanks to @Kh4L @zaristei @rlratzel @rliu
for their contributions

This is thoroughly tested by many internal and external users and should be good to merge. we will continue to improve this pipeline in future PRs but this PR is ready. Future PRs:

add the RAG CI job back after figuring out why tests pass but the CI gives a red X
add a non-toy dataset for the default data (needs to be released by NVIDIA on huggingface, waiting on that)
further improvements to user friendliness of example

I will work with @zaristei on refining RAGQueryLoader and Feature/Graph Store workflow as directed by @wsad1
Note: NVIDIA CI will run all unit tests (including these rag related ones) as well as the full txt2kg_rag.py example everytime we update our NVIDIA container. This will run on H100, B100, and A100 a few diff skus. Then NVIDIA QA also runs on almost every hardware SKU before each bi-monthly release.

(closed subPR: #10368)
I had broken this PR down but then matthias said I could merge if he hasnt reviewed before i came back from vacay since we have tested and reviewed this so thoroughly at NVIDIA and externally on top of my involvement in most of the llm features of PyG.

NVIDIA's backing of this project spans many orgs all the way to the top and has fortune500 names adopting it left and right. Getting this merged in is the first step towards cementing PyG as THE framework for anything "Graph" at NVIDIA and I am leading this charge. We will continue to optimize as time goes and treat any issues that come up as p0s to fix asap. But merging this is an important first step.

Co-authored-by: riship <[email protected]>

for more information, see https://pre-commit.ci

Co-authored-by: riship <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

for more information, see https://pre-commit.ci

…ometric into reorg-txt2kg

… since NVIDIA owns this i want to make the linting a little more lax

for more information, see https://pre-commit.ci

akihironitta

LGTM as per offline discussion. I left some final comments before merging.

torch_geometric/typing.py

.github/workflows/linting.yml

CHANGELOG.md

Co-authored-by: Akihiro Nitta <[email protected]>

for more information, see https://pre-commit.ci

akihironitta

As per our offline discussion amongst @rusty1s @wsad1 @puririshi98 and myself, we have decided to create a new subpackage torch_geometric.llm and assign @puririshi98 as its codeowner so that NVIDIA and the PyG community can iterate on the integration much more quickly even without needing to guarantee the same standard as the PyG core, e.g., thorough test coverage, documentation and code quality.

Thanks again @puririshi98 and the team for your patience and the exciting work! 🚀

puririshi98 and others added 30 commits February 4, 2025 21:06

fix

7ed58bd

Merge branch 'master' into latest-txt2kg

8f5e3af

Update hotpot_qa.py

b984886

Update hotpot_qa.py

4925f5c

save command (#10005)

7d064dc

Co-authored-by: riship <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

ba0acd2

for more information, see https://pre-commit.ci

cleaning

5146f5a

[pre-commit.ci] auto fixes from pre-commit.com hooks

0749bb3

for more information, see https://pre-commit.ci

Merge branch 'master' into latest-txt2kg

ba79f26

Merge branch 'master' into latest-txt2kg

5f736ff

Update tech_qa.py

edd2c04

[pre-commit.ci] auto fixes from pre-commit.com hooks

61aeb79

for more information, see https://pre-commit.ci

Update tech_qa.py w preproc instructions

4dfe2dc

Merge branch 'master' into latest-txt2kg

8438ed6

[pre-commit.ci] auto fixes from pre-commit.com hooks

8150e00

for more information, see https://pre-commit.ci

Update tech_qa.py

10493a9

Update g_retriever.py

a95663c

Merge branch 'master' into latest-txt2kg

d6e7ada

[pre-commit.ci] auto fixes from pre-commit.com hooks

2ff65bb

for more information, see https://pre-commit.ci

Update txt2kg.py

5f9388b

[pre-commit.ci] auto fixes from pre-commit.com hooks

70d6cb2

for more information, see https://pre-commit.ci

Update txt2kg.py

bb3992d

NIMs can be unreliable, more retries

03fc8a5

more retries for llmjudge

e70d88e

update data set up (#10104)

03be3de

Co-authored-by: riship <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Merge branch 'master' into latest-txt2kg

959d5f1

Update txt2kg.py

2ffd715

fix syntax

914aeea

[pre-commit.ci] auto fixes from pre-commit.com hooks

04b4faf

for more information, see https://pre-commit.ci

Merge branch 'master' into latest-txt2kg

068eb1e

puririshi98 and others added 13 commits September 4, 2025 11:40

one more fix

1c5158c

[pre-commit.ci] auto fixes from pre-commit.com hooks

9ede514

for more information, see https://pre-commit.ci

one more fix

c418a27

Merge branch 'reorg-txt2kg' of https://github.com/pyg-team/pytorch_ge…

ca03a5e

…ometric into reorg-txt2kg

one more fix

7b9d750

one more fix

9ad0d67

one more fix

69d99f6

one more fix

676b46a

Update linting.yml. Lint passed before, made no real changes. I think…

28e2452

… since NVIDIA owns this i want to make the linting a little more lax

Update linting.yml

9db6d4f

Update linting.yml

4440d92

fix

7809ecc

[pre-commit.ci] auto fixes from pre-commit.com hooks

c6a7ff2

for more information, see https://pre-commit.ci

akihironitta reviewed Sep 4, 2025

View reviewed changes

torch_geometric/typing.py Outdated Show resolved Hide resolved

.github/workflows/linting.yml Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

akihironitta added the 0 - Priority P0 label Sep 4, 2025

akihironitta added this to the 2.7.0 milestone Sep 4, 2025

puririshi98 and others added 3 commits September 4, 2025 15:03

Update CHANGELOG.md

176f766

Co-authored-by: Akihiro Nitta <[email protected]>

Update pyproject.toml

f3f3173

Merge branch 'master' into reorg-txt2kg

c8cb934

akihironitta changed the title ~~Reorg txt2kg~~ Reorganize TXT2KG and introduce torch_geometric.llm Sep 5, 2025

akihironitta self-assigned this Sep 5, 2025

akihironitta and others added 4 commits September 5, 2025 17:43

update

1836068

update

f1a67d4

[pre-commit.ci] auto fixes from pre-commit.com hooks

1264732

for more information, see https://pre-commit.ci

update

1be3318

github-actions bot removed the sampler label Sep 5, 2025

update

4a7fa78

akihironitta approved these changes Sep 5, 2025

View reviewed changes

akihironitta merged commit d4a442b into master Sep 5, 2025
16 checks passed

akihironitta deleted the reorg-txt2kg branch September 5, 2025 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reorganize TXT2KG and introduce `torch_geometric.llm` #10436

Reorganize TXT2KG and introduce `torch_geometric.llm` #10436

Uh oh!

puririshi98 commented Sep 4, 2025 •

edited

Loading

Uh oh!

akihironitta left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akihironitta left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Reorganize TXT2KG and introduce torch_geometric.llm #10436

Reorganize TXT2KG and introduce torch_geometric.llm #10436

Uh oh!

Conversation

puririshi98 commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

akihironitta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

akihironitta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Reorganize TXT2KG and introduce `torch_geometric.llm` #10436

Reorganize TXT2KG and introduce `torch_geometric.llm` #10436

puririshi98 commented Sep 4, 2025 •

edited

Loading