Add MoleculeGPT #9710

xnuohz · 2024-10-15T17:58:43Z

Issue

Feature Summary

Add MoleculeGPTDataset
Add MoleculeGPT as GNN & LLM Co-training model to PyG
Add an example for training and testing
Split the PR into 3 sub-PRs (MoleculeGPT (1/n) - Add dataset #9723, MoleculeGPT (2/n) - Add model #9724, MoleculeGPT (3/n) - Add example #9725)
Limited hardware resources, can't load lmsys/vicuna-7b-v1.5, use TinyLlama/TinyLlama-1.1B-Chat-v0.1 instead, and the full training pipeline was not tested

for more information, see https://pre-commit.ci

…tric into moleculegpt/dataset

for more information, see https://pre-commit.ci

…tric into moleculegpt/dataset

for more information, see https://pre-commit.ci

…tric into moleculegpt/dataset

rusty1s · 2024-10-22T20:07:12Z

@xnuohz Looks great. Can you do us a favor and split the PR into multiple? I would imagine we can merge dataset, model and example separately to ease reviewing.

xnuohz · 2024-10-23T02:21:58Z

@xnuohz Looks great. Can you do us a favor and split the PR into multiple? I would imagine we can merge dataset, model and example separately to ease reviewing.

@rusty1s Got it. I'll do this later.

puririshi98 · 2024-10-29T15:29:14Z

@xnuohz, notice CI fails because:
"E ModuleNotFoundError: No module named 'transformers'"
you need to add "@withPackage('transformers', 'sentencepiece', 'accelerate')"
Since the PyG github CI does not have these installed by default please manually test the unit test and share the results as a comment so we know everything works fine. At NVIDIA we do have CI that tests with these packages installed so once your work is merged it will be mantained :)

xnuohz · 2024-10-30T14:09:33Z

@puririshi98 Fixed CI and test the unit test locally.

puririshi98 · 2024-11-08T17:00:34Z

LGTM

puririshi98 · 2024-11-08T17:00:53Z

will wait for @rusty1s and @akihironitta to review/merge

for more information, see https://pre-commit.ci

puririshi98 · 2024-11-20T01:44:02Z

root@keystone-dvt1d-023-114:/workspace/pytorch_geometric# python3 examples/llm/molecule_gpt.py 
Setting up 'TinyLlama/TinyLlama-1.1B-Chat-v0.1' with configuration: {'revision': 'main', 'max_memory': {0: '93GiB'}, 'low_cpu_mem_usage': True, 'device_map': 'auto', 'torch_dtype': torch.bfloat16}
Some weights of RobertaModel were not initialized from the model checkpoint at DeepChem/ChemBERTa-77M-MTR and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Total Preparation Time: 1.987466s
Training beginning...
Epoch: 1|3:   0%|                                                                                                                                                 | 0/1719 [00:00<?, ?it/s]/usr/local/lib/python3.12/dist-packages/torch/autograd/graph.py:825: UserWarning: cuDNN SDPA backward got grad_output.strides() != output.strides(), attempting to materialize a grad_output with matching strides... (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/cudnn/MHA.cpp:674.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
Epoch: 1|3: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1719/1719 [03:22<00:00,  8.51it/s]
Epoch: 1|3, Train loss: 1.067092, Val loss: 1.081951
Epoch: 2|3: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1719/1719 [02:59<00:00,  9.58it/s]
Epoch: 2|3, Train loss: 0.844542, Val loss: 1.037265
Epoch: 3|3: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1719/1719 [03:01<00:00,  9.48it/s]
Epoch: 3|3, Train loss: 0.812881, Val loss: 1.026247
/usr/local/lib/python3.12/dist-packages/torch/cuda/memory.py:369: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
  warnings.warn(
Total Training Time: 591.698638s
Test loss: 1.042540
Total Time: 602.299925s

merging since @rusty1s is busy until new year. cc @akihironitta

puririshi98

LGTM

### Issue - pyg-team#9694 - pyg-team#9698 ### Feature Summary - Add `MoleculeGPTDataset` - Add `MoleculeGPT` as GNN & LLM Co-training model to PyG - Add an example for training and testing - Split the PR into 3 sub-PRs (pyg-team#9723, pyg-team#9724, pyg-team#9725) - Limited hardware resources, can't load `lmsys/vicuna-7b-v1.5`, use `TinyLlama/TinyLlama-1.1B-Chat-v0.1` instead, and the full training pipeline was not tested --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Giovanni Gatti <[email protected]> Co-authored-by: Rishi Puri <[email protected]>

xnuohz added 3 commits October 15, 2024 21:41

update

22f7b73

update

58c36b2

update

53c2df9

xnuohz requested review from EdisonLeeeee and wsad1 as code owners October 15, 2024 17:58

pre-commit-ci bot and others added 7 commits October 15, 2024 18:00

[pre-commit.ci] auto fixes from pre-commit.com hooks

9ee68e5

for more information, see https://pre-commit.ci

update

1f61674

[pre-commit.ci] auto fixes from pre-commit.com hooks

a3150ac

for more information, see https://pre-commit.ci

update

ef07f40

Merge branch 'moleculegpt/dataset' of github.com:xnuohz/pytorch_geome…

8972624

…tric into moleculegpt/dataset

update

ad378b4

update

f837c7e

giovanni-gatti mentioned this pull request Oct 18, 2024

Q-Former module xnuohz/pytorch_geometric#1

Merged

xnuohz and others added 17 commits October 19, 2024 15:42

update

35f1f8b

add chemgelog

4a42758

[pre-commit.ci] auto fixes from pre-commit.com hooks

307c7d6

for more information, see https://pre-commit.ci

update

8e36c41

Merge branch 'moleculegpt/dataset' of github.com:xnuohz/pytorch_geome…

35ac33d

…tric into moleculegpt/dataset

update

71d9182

update

9c0fa07

added qformer module (#1)

ad6cc50

Merge branch 'moleculegpt/dataset' of github.com:xnuohz/pytorch_geome…

f437233

…tric into moleculegpt/dataset

[pre-commit.ci] auto fixes from pre-commit.com hooks

102d2f3

for more information, see https://pre-commit.ci

update

48b7444

update

bb55bba

[pre-commit.ci] auto fixes from pre-commit.com hooks

2328d39

for more information, see https://pre-commit.ci

update

f893fc0

[pre-commit.ci] auto fixes from pre-commit.com hooks

731fd37

for more information, see https://pre-commit.ci

update

028dc3a

Merge branch 'moleculegpt/dataset' of github.com:xnuohz/pytorch_geome…

5766ad5

…tric into moleculegpt/dataset

xnuohz changed the title ~~[WIP] MoleculeGPT~~ Add MoleculeGPT Oct 21, 2024

This was referenced Oct 23, 2024

MoleculeGPT (1/n) - Add dataset #9723

Closed

MoleculeGPT (2/n) - Add model #9724

Closed

MoleculeGPT (3/n) - Add example #9725

Closed

xnuohz added 9 commits October 30, 2024 01:58

update

36e600f

update

ed993dc

Merge branch 'master' into moleculegpt/dataset

07f57c4

update

6d988b4

update

114e654

update

e31b631

update

84a9938

update

71317f0

update

6608599

Merge branch 'master' into moleculegpt/dataset

6c46828

xnuohz and others added 4 commits November 9, 2024 17:45

update

06e3f7a

Merge branch 'master' into moleculegpt/dataset

cd0b606

[pre-commit.ci] auto fixes from pre-commit.com hooks

d54238e

for more information, see https://pre-commit.ci

fixes

c85671d

puririshi98 self-requested a review November 20, 2024 01:44

puririshi98 approved these changes Nov 20, 2024

View reviewed changes

puririshi98 merged commit 529237c into pyg-team:master Nov 20, 2024
16 checks passed

xnuohz deleted the moleculegpt/dataset branch November 20, 2024 06:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MoleculeGPT #9710

Add MoleculeGPT #9710

Uh oh!

xnuohz commented Oct 15, 2024 •

edited

Loading

Uh oh!

rusty1s commented Oct 22, 2024

Uh oh!

xnuohz commented Oct 23, 2024

Uh oh!

puririshi98 commented Oct 29, 2024

Uh oh!

xnuohz commented Oct 30, 2024

Uh oh!

puririshi98 commented Nov 8, 2024

Uh oh!

puririshi98 commented Nov 8, 2024

Uh oh!

puririshi98 commented Nov 20, 2024

Uh oh!

puririshi98 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add MoleculeGPT #9710

Add MoleculeGPT #9710

Uh oh!

Conversation

xnuohz commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Feature Summary

Uh oh!

rusty1s commented Oct 22, 2024

Uh oh!

xnuohz commented Oct 23, 2024

Uh oh!

puririshi98 commented Oct 29, 2024

Uh oh!

xnuohz commented Oct 30, 2024

Uh oh!

puririshi98 commented Nov 8, 2024

Uh oh!

puririshi98 commented Nov 8, 2024

Uh oh!

puririshi98 commented Nov 20, 2024

Uh oh!

puririshi98 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xnuohz commented Oct 15, 2024 •

edited

Loading