Add GIT-Mol #9730

xnuohz · 2024-10-24T16:58:34Z

Issue

Feature Summary

Add GitMolDataset
Add GITMol as GNN & LLM Co-training model to PyG
Add an example for pre-training
Limited hardware resources, so the full training pipeline was not tested
Multi modal cross attention shares the same weight, not aligned with the original paper

for more information, see https://pre-commit.ci

…t-mol

for more information, see https://pre-commit.ci

…t-mol

for more information, see https://pre-commit.ci

…t-mol

xnuohz · 2024-11-15T17:28:24Z

@puririshi98 Local CI passed.

puririshi98 · 2024-11-20T02:25:20Z

@xnuohz i am confused, how is the data meant to be downloaded if the download function is pass and the process function does not have any dowload step.

root@keystone-dvt1d-023-114:/workspace/pytorch_geometric# python3 examples/llm/git_mol.py 
Processing...
Traceback (most recent call last):
  File "/workspace/pytorch_geometric/examples/llm/git_mol.py", line 127, in <module>
    train(
  File "/workspace/pytorch_geometric/examples/llm/git_mol.py", line 41, in train
    train_dataset = GitMolDataset(path, split=0)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch_geometric/datasets/git_mol_dataset.py", line 71, in __init__
    super().__init__(root, transform, pre_transform, pre_filter,
  File "/usr/local/lib/python3.12/dist-packages/torch_geometric/data/in_memory_dataset.py", line 81, in __init__
    super().__init__(root, transform, pre_transform, pre_filter, log,
  File "/usr/local/lib/python3.12/dist-packages/torch_geometric/data/dataset.py", line 115, in __init__
    self._process()
  File "/usr/local/lib/python3.12/dist-packages/torch_geometric/data/dataset.py", line 262, in _process
    self.process()
  File "/usr/local/lib/python3.12/dist-packages/torch_geometric/datasets/git_mol_dataset.py", line 161, in process
    data = pd.read_pickle(
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pandas/io/pickle.py", line 185, in read_pickle
    with get_handle(
         ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pandas/io/common.py", line 882, in get_handle
    handle = open(handle, ioargs.mode)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/workspace/pytorch_geometric/data/GITMol/raw/igcdata_toy/train_3500.pkl'

I know you are low on compute so feel free to ping me on slack when you have a change and i can test it quickly to make sure we have something working

for more information, see https://pre-commit.ci

…t-mol

for more information, see https://pre-commit.ci

…t-mol

puririshi98

LGTM
in NVIDIA container:

/workspace/pytorch_geometric# python3 examples/llm/git_mol.py 
Downloading https://drive.usercontent.google.com/download?id=1loBXabD6ncAFY-vanRsVtRUSFkEtBweg&confirm=t
Extracting /workspace/pytorch_geometric/data/GITMol/raw/gitmol.zip
Processing...
  0%|                                                                                                                                                             | 0/3610 [00:00<?, ?it/s]/usr/local/lib/python3.12/dist-packages/torchvision/transforms/_functional_pil.py:113: RuntimeWarning: invalid value encountered in cast
  np_h += np.array(hue_factor * 255).astype(np.uint8)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3610/3610 [00:08<00:00, 421.48it/s]
Done!
Using existing file gitmol.zip
Extracting /workspace/pytorch_geometric/data/GITMol/raw/gitmol.zip
Processing...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 451/451 [00:00<00:00, 721.40it/s]
Done!
Using existing file gitmol.zip
Extracting /workspace/pytorch_geometric/data/GITMol/raw/gitmol.zip
Processing...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 451/451 [00:00<00:00, 786.91it/s]
Done!
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 385/385 [00:00<00:00, 3.86MB/s]
vocab.txt: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 228k/228k [00:00<00:00, 3.47MB/s]
pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 442M/442M [00:06<00:00, 64.4MB/s]
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 71.8k/71.8k [00:00<00:00, 65.6MB/s]
Some weights of BertModel were not initialized from the model checkpoint at allenai/scibert_scivocab_uncased and are newly initialized: ['bert.encoder.layer.0.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.0.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.0.crossattention.output.dense.bias', 'bert.encoder.layer.0.crossattention.output.dense.weight', 'bert.encoder.layer.0.crossattention.self.key.bias', 'bert.encoder.layer.0.crossattention.self.key.weight', 'bert.encoder.layer.0.crossattention.self.query.bias', 'bert.encoder.layer.0.crossattention.self.query.weight', 'bert.encoder.layer.0.crossattention.self.value.bias', 'bert.encoder.layer.0.crossattention.self.value.weight', 'bert.encoder.layer.1.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.1.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.1.crossattention.output.dense.bias', 'bert.encoder.layer.1.crossattention.output.dense.weight', 'bert.encoder.layer.1.crossattention.self.key.bias', 'bert.encoder.layer.1.crossattention.self.key.weight', 'bert.encoder.layer.1.crossattention.self.query.bias', 'bert.encoder.layer.1.crossattention.self.query.weight', 'bert.encoder.layer.1.crossattention.self.value.bias', 'bert.encoder.layer.1.crossattention.self.value.weight', 'bert.encoder.layer.10.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.10.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.10.crossattention.output.dense.bias', 'bert.encoder.layer.10.crossattention.output.dense.weight', 'bert.encoder.layer.10.crossattention.self.key.bias', 'bert.encoder.layer.10.crossattention.self.key.weight', 'bert.encoder.layer.10.crossattention.self.query.bias', 'bert.encoder.layer.10.crossattention.self.query.weight', 'bert.encoder.layer.10.crossattention.self.value.bias', 'bert.encoder.layer.10.crossattention.self.value.weight', 'bert.encoder.layer.11.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.11.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.11.crossattention.output.dense.bias', 'bert.encoder.layer.11.crossattention.output.dense.weight', 'bert.encoder.layer.11.crossattention.self.key.bias', 'bert.encoder.layer.11.crossattention.self.key.weight', 'bert.encoder.layer.11.crossattention.self.query.bias', 'bert.encoder.layer.11.crossattention.self.query.weight', 'bert.encoder.layer.11.crossattention.self.value.bias', 'bert.encoder.layer.11.crossattention.self.value.weight', 'bert.encoder.layer.2.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.2.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.output.dense.bias', 'bert.encoder.layer.2.crossattention.output.dense.weight', 'bert.encoder.layer.2.crossattention.self.key.bias', 'bert.encoder.layer.2.crossattention.self.key.weight', 'bert.encoder.layer.2.crossattention.self.query.bias', 'bert.encoder.layer.2.crossattention.self.query.weight', 'bert.encoder.layer.2.crossattention.self.value.bias', 'bert.encoder.layer.2.crossattention.self.value.weight', 'bert.encoder.layer.3.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.3.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.3.crossattention.output.dense.bias', 'bert.encoder.layer.3.crossattention.output.dense.weight', 'bert.encoder.layer.3.crossattention.self.key.bias', 'bert.encoder.layer.3.crossattention.self.key.weight', 'bert.encoder.layer.3.crossattention.self.query.bias', 'bert.encoder.layer.3.crossattention.self.query.weight', 'bert.encoder.layer.3.crossattention.self.value.bias', 'bert.encoder.layer.3.crossattention.self.value.weight', 'bert.encoder.layer.4.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.4.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.4.crossattention.output.dense.bias', 'bert.encoder.layer.4.crossattention.output.dense.weight', 'bert.encoder.layer.4.crossattention.self.key.bias', 'bert.encoder.layer.4.crossattention.self.key.weight', 'bert.encoder.layer.4.crossattention.self.query.bias', 'bert.encoder.layer.4.crossattention.self.query.weight', 'bert.encoder.layer.4.crossattention.self.value.bias', 'bert.encoder.layer.4.crossattention.self.value.weight', 'bert.encoder.layer.5.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.5.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.5.crossattention.output.dense.bias', 'bert.encoder.layer.5.crossattention.output.dense.weight', 'bert.encoder.layer.5.crossattention.self.key.bias', 'bert.encoder.layer.5.crossattention.self.key.weight', 'bert.encoder.layer.5.crossattention.self.query.bias', 'bert.encoder.layer.5.crossattention.self.query.weight', 'bert.encoder.layer.5.crossattention.self.value.bias', 'bert.encoder.layer.5.crossattention.self.value.weight', 'bert.encoder.layer.6.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.6.crossattention.output.dense.bias', 'bert.encoder.layer.6.crossattention.output.dense.weight', 'bert.encoder.layer.6.crossattention.self.key.bias', 'bert.encoder.layer.6.crossattention.self.key.weight', 'bert.encoder.layer.6.crossattention.self.query.bias', 'bert.encoder.layer.6.crossattention.self.query.weight', 'bert.encoder.layer.6.crossattention.self.value.bias', 'bert.encoder.layer.6.crossattention.self.value.weight', 'bert.encoder.layer.7.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.7.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.7.crossattention.output.dense.bias', 'bert.encoder.layer.7.crossattention.output.dense.weight', 'bert.encoder.layer.7.crossattention.self.key.bias', 'bert.encoder.layer.7.crossattention.self.key.weight', 'bert.encoder.layer.7.crossattention.self.query.bias', 'bert.encoder.layer.7.crossattention.self.query.weight', 'bert.encoder.layer.7.crossattention.self.value.bias', 'bert.encoder.layer.7.crossattention.self.value.weight', 'bert.encoder.layer.8.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.8.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.8.crossattention.output.dense.bias', 'bert.encoder.layer.8.crossattention.output.dense.weight', 'bert.encoder.layer.8.crossattention.self.key.bias', 'bert.encoder.layer.8.crossattention.self.key.weight', 'bert.encoder.layer.8.crossattention.self.query.bias', 'bert.encoder.layer.8.crossattention.self.query.weight', 'bert.encoder.layer.8.crossattention.self.value.bias', 'bert.encoder.layer.8.crossattention.self.value.weight', 'bert.encoder.layer.9.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.9.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.9.crossattention.output.dense.bias', 'bert.encoder.layer.9.crossattention.output.dense.weight', 'bert.encoder.layer.9.crossattention.self.key.bias', 'bert.encoder.layer.9.crossattention.self.key.weight', 'bert.encoder.layer.9.crossattention.self.query.bias', 'bert.encoder.layer.9.crossattention.self.query.weight', 'bert.encoder.layer.9.crossattention.self.value.bias', 'bert.encoder.layer.9.crossattention.self.value.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Training beginning...
Epoch: 1|3:   0%|                                                                                                                                                  | 0/902 [00:00<?, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Epoch: 1|3: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 902/902 [04:31<00:00,  3.32it/s]
Epoch: 1|3, Train loss: 0.886526, Val loss: 1.076906
Epoch: 2|3: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 902/902 [04:31<00:00,  3.32it/s]
Epoch: 2|3, Train loss: 0.797038, Val loss: 1.134945
Epoch: 3|3: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 902/902 [04:34<00:00,  3.29it/s]
Epoch: 3|3, Train loss: 0.776674, Val loss: 1.252366
/usr/local/lib/python3.12/dist-packages/torch/cuda/memory.py:369: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
  warnings.warn(
Test loss: 1.255684

### Issue - pyg-team#9694 - pyg-team#9700 ### Feature Summary - Add `GitMolDataset` - Add `GITMol` as GNN & LLM Co-training model to PyG - Add an example for pre-training - Limited hardware resources, so the full training pipeline was not tested - Multi modal cross attention shares the same weight, not aligned with the original paper --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Rishi Puri <[email protected]>

update

f08f9ec

xnuohz requested review from EdisonLeeeee and wsad1 as code owners October 24, 2024 16:58

pre-commit-ci bot and others added 21 commits October 24, 2024 16:59

[pre-commit.ci] auto fixes from pre-commit.com hooks

0f22005

for more information, see https://pre-commit.ci

update

b48d80a

Merge branch 'git-mol' of github.com:xnuohz/pytorch_geometric into gi…

057b94f

…t-mol

update

08f34c3

[pre-commit.ci] auto fixes from pre-commit.com hooks

2e8dd6c

for more information, see https://pre-commit.ci

update

cf0a5ed

Merge branch 'git-mol' of github.com:xnuohz/pytorch_geometric into gi…

9606ac4

…t-mol

update

f97d876

update

fb51f7a

update

89c6e13

[pre-commit.ci] auto fixes from pre-commit.com hooks

86d952d

for more information, see https://pre-commit.ci

Merge branch 'master' into git-mol

73ef4d4

update

5747b5d

[pre-commit.ci] auto fixes from pre-commit.com hooks

f5fcd2b

for more information, see https://pre-commit.ci

update

13ca929

Merge branch 'git-mol' of github.com:xnuohz/pytorch_geometric into gi…

56a423d

…t-mol

update

da43bc8

update

0012833

update

b4a3c2d

update

04f9e25

update

6492e1f

xnuohz changed the title ~~[WIP] GIT-Mol~~ Add GIT-Mol Nov 3, 2024

xnuohz added 5 commits November 13, 2024 22:23

update

d636aad

Merge branch 'master' into git-mol

324d694

update

fc19829

update

a20ce9b

update

1d64ef1

xnuohz added 6 commits November 14, 2024 01:30

update

5b31dcf

update

10ec3ef

update

faee735

update

6fbe320

update

cd25e61

update

8bf086d

puririshi98 and others added 7 commits November 19, 2024 18:26

Merge branch 'master' into git-mol

5ffbb62

[pre-commit.ci] auto fixes from pre-commit.com hooks

0bbf31c

for more information, see https://pre-commit.ci

update

e411b54

Merge branch 'master' into git-mol

cea9b45

update

fd47422

Merge branch 'git-mol' of github.com:xnuohz/pytorch_geometric into gi…

3799ccc

…t-mol

fix lint

10564f9

puririshi98 self-requested a review November 22, 2024 17:37

xnuohz and others added 6 commits November 23, 2024 15:07

update

7bb90b5

update

4670641

[pre-commit.ci] auto fixes from pre-commit.com hooks

489e0c6

for more information, see https://pre-commit.ci

update

e92f250

Merge branch 'git-mol' of github.com:xnuohz/pytorch_geometric into gi…

8e128c9

…t-mol

update

631430c

puririshi98 approved these changes Nov 25, 2024

View reviewed changes

puririshi98 merged commit f732427 into pyg-team:master Nov 25, 2024
16 checks passed

xnuohz deleted the git-mol branch November 25, 2024 05:49

mayur65 mentioned this pull request Nov 25, 2024

[WIP] GIT-Mol implementation - Add model #9738

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GIT-Mol #9730

Add GIT-Mol #9730

Uh oh!

xnuohz commented Oct 24, 2024 •

edited

Loading

Uh oh!

xnuohz commented Nov 15, 2024

Uh oh!

puririshi98 commented Nov 20, 2024

Uh oh!

puririshi98 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add GIT-Mol #9730

Add GIT-Mol #9730

Uh oh!

Conversation

xnuohz commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Feature Summary

Uh oh!

xnuohz commented Nov 15, 2024

Uh oh!

puririshi98 commented Nov 20, 2024

Uh oh!

puririshi98 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xnuohz commented Oct 24, 2024 •

edited

Loading