Skip to content

Conversation

@xnuohz
Copy link
Contributor

@xnuohz xnuohz commented May 25, 2025

Feature summary

  • Add a new dataset ProteinMPNNDataset includes 2 versions (small and large)
  • Add a new model ProteinMPNN
  • Add unit test
  • Add an example for loading the dataset and training the model

Issue

Usage

python examples/llm/protein_mpnn.py --gradient_norm 5 --mixed_precision True

Highlight

github_fig

Small dataset training log

python examples/llm/protein_mpnn.py --gradient_norm 5 --mixed_precision True
Processing...
Processing split: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 473127/473127 [00:11<00:00, 41755.99it/s]
Processing:   0%|                                                                                                                | 0/438572 [00:00<?, ?it/s]/home/ubuntu/Projects/pytorch_geometric/torch_geometric/datasets/protein_mpnn_dataset.py:211: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  meta = torch.load(f'{prefix}.pt')
/home/ubuntu/Projects/pytorch_geometric/torch_geometric/datasets/protein_mpnn_dataset.py:243: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  c: torch.load(f'{prefix}_{c}.pt')
Processing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 438572/438572 [00:01<00:00, 256820.89it/s]
Done!
Processing...
Load split
Processing: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 17957/17957 [00:00<00:00, 40089.40it/s]
Done!
Processing...
Load split
Processing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 16598/16598 [00:00<00:00, 467106.31it/s]
Done!
epoch: 000, step: 0, train: 27.305, valid: 19.288, train_acc: 0.068, valid_acc: 0.111
epoch: 001, step: 0, train: 18.214, valid: 19.615, train_acc: 0.103, valid_acc: 0.078
epoch: 002, step: 0, train: 16.328, valid: 19.160, train_acc: 0.131, valid_acc: 0.097
epoch: 003, step: 0, train: 15.034, valid: 19.335, train_acc: 0.159, valid_acc: 0.118
epoch: 004, step: 0, train: 13.922, valid: 18.412, train_acc: 0.186, valid_acc: 0.120
epoch: 005, step: 0, train: 12.825, valid: 17.270, train_acc: 0.216, valid_acc: 0.143
epoch: 006, step: 0, train: 12.273, valid: 19.554, train_acc: 0.231, valid_acc: 0.132
epoch: 007, step: 0, train: 11.274, valid: 17.401, train_acc: 0.257, valid_acc: 0.151
epoch: 008, step: 0, train: 10.498, valid: 17.596, train_acc: 0.283, valid_acc: 0.145
epoch: 009, step: 0, train: 9.125, valid: 17.946, train_acc: 0.334, valid_acc: 0.145
epoch: 010, step: 0, train: 8.843, valid: 18.615, train_acc: 0.352, valid_acc: 0.152
epoch: 011, step: 0, train: 7.689, valid: 19.883, train_acc: 0.408, valid_acc: 0.144
epoch: 012, step: 0, train: 7.028, valid: 16.705, train_acc: 0.441, valid_acc: 0.166
epoch: 013, step: 0, train: 6.340, valid: 18.358, train_acc: 0.476, valid_acc: 0.159
epoch: 014, step: 0, train: 5.573, valid: 21.194, train_acc: 0.521, valid_acc: 0.148
epoch: 015, step: 0, train: 5.223, valid: 18.807, train_acc: 0.537, valid_acc: 0.154
epoch: 016, step: 0, train: 4.525, valid: 23.286, train_acc: 0.575, valid_acc: 0.160
epoch: 017, step: 0, train: 4.288, valid: 18.867, train_acc: 0.590, valid_acc: 0.173
epoch: 018, step: 0, train: 3.842, valid: 25.381, train_acc: 0.615, valid_acc: 0.159
epoch: 019, step: 0, train: 3.798, valid: 16.560, train_acc: 0.625, valid_acc: 0.196
epoch: 020, step: 0, train: 3.464, valid: 15.796, train_acc: 0.652, valid_acc: 0.222
epoch: 021, step: 0, train: 3.095, valid: 10.163, train_acc: 0.687, valid_acc: 0.328
epoch: 022, step: 0, train: 2.602, valid: 8.160, train_acc: 0.739, valid_acc: 0.396
epoch: 023, step: 0, train: 2.444, valid: 12.667, train_acc: 0.764, valid_acc: 0.321
epoch: 024, step: 0, train: 2.230, valid: 7.600, train_acc: 0.790, valid_acc: 0.443
epoch: 025, step: 0, train: 2.139, valid: 7.181, train_acc: 0.802, valid_acc: 0.469
epoch: 026, step: 0, train: 2.050, valid: 6.300, train_acc: 0.813, valid_acc: 0.495
epoch: 027, step: 0, train: 2.008, valid: 6.220, train_acc: 0.819, valid_acc: 0.508
epoch: 028, step: 0, train: 1.963, valid: 6.453, train_acc: 0.825, valid_acc: 0.509
epoch: 029, step: 0, train: 1.956, valid: 5.357, train_acc: 0.826, valid_acc: 0.546
epoch: 030, step: 0, train: 1.883, valid: 6.460, train_acc: 0.835, valid_acc: 0.522
epoch: 031, step: 0, train: 1.849, valid: 5.659, train_acc: 0.841, valid_acc: 0.553
epoch: 032, step: 0, train: 1.807, valid: 5.535, train_acc: 0.846, valid_acc: 0.561
epoch: 033, step: 0, train: 1.798, valid: 5.756, train_acc: 0.846, valid_acc: 0.541
epoch: 034, step: 0, train: 1.779, valid: 5.609, train_acc: 0.850, valid_acc: 0.552
epoch: 035, step: 0, train: 1.737, valid: 5.565, train_acc: 0.856, valid_acc: 0.555
epoch: 036, step: 0, train: 1.717, valid: 6.836, train_acc: 0.860, valid_acc: 0.534
epoch: 037, step: 0, train: 1.697, valid: 5.379, train_acc: 0.863, valid_acc: 0.557
epoch: 038, step: 0, train: 1.683, valid: 6.461, train_acc: 0.865, valid_acc: 0.543
epoch: 039, step: 0, train: 1.680, valid: 5.368, train_acc: 0.867, valid_acc: 0.556
epoch: 040, step: 0, train: 1.655, valid: 5.264, train_acc: 0.870, valid_acc: 0.560
epoch: 041, step: 0, train: 1.639, valid: 6.009, train_acc: 0.874, valid_acc: 0.543
epoch: 042, step: 0, train: 1.623, valid: 5.207, train_acc: 0.876, valid_acc: 0.557
epoch: 043, step: 0, train: 1.609, valid: 5.855, train_acc: 0.880, valid_acc: 0.549
epoch: 044, step: 0, train: 1.595, valid: 6.061, train_acc: 0.881, valid_acc: 0.554
epoch: 045, step: 0, train: 1.576, valid: 6.337, train_acc: 0.887, valid_acc: 0.547
epoch: 046, step: 0, train: 1.571, valid: 6.588, train_acc: 0.888, valid_acc: 0.540
epoch: 047, step: 0, train: 1.553, valid: 5.722, train_acc: 0.890, valid_acc: 0.551
epoch: 048, step: 0, train: 1.529, valid: 5.834, train_acc: 0.895, valid_acc: 0.552
epoch: 049, step: 0, train: 1.523, valid: 5.628, train_acc: 0.897, valid_acc: 0.567
Average Epoch Time: 4.2097s
Median Epoch Time: 4.2084s
Total Program Runtime: 225.3954s
test: 5.006, test_acc: 0.569

@xnuohz xnuohz requested review from EdisonLeeeee and wsad1 as code owners May 25, 2025 17:31
@codecov
Copy link

codecov bot commented May 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.64%. Comparing base (c211214) to head (9b81985).
⚠️ Report is 117 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10289      +/-   ##
==========================================
- Coverage   86.11%   85.64%   -0.47%     
==========================================
  Files         496      499       +3     
  Lines       33655    34297     +642     
==========================================
+ Hits        28981    29373     +392     
- Misses       4674     4924     +250     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xnuohz

This comment was marked as off-topic.

@xnuohz
Copy link
Contributor Author

xnuohz commented May 31, 2025

@jdhenaos plz share your large dataset training log here, thanks:)

@jdhenaos
Copy link
Contributor

Large dataset training log

python pytorch_geometric/examples/llm/protein_mpnn.py --gradient_norm 5 --mixed_precision True --size large > pmpnn.log
/home/juan/anaconda3/envs/pygfix/lib/python3.10/site-packages/torch/amp/grad_scaler.py:132: UserWarning: torch.cuda.amp.GradScaler is enabled, but CUDA is not available.  Disabling.
  warnings.warn(
/home/juan/Documents/projects/pytorch_geometric/examples/llm/protein_mpnn.py:164: UserWarning: WARNING: Available system RAM (62.24 GB) is less than the dataset size (64.1 GB).
Consider freeing memory or using a machine with more available RAM.
  warnings.warn(
Downloading https://files.ipd.uw.edu/pub/training_sets/pdb_2021aug02.tar.gz
Extracting data/ProteinMPNN/raw/pdb_2021aug02.tar.gz
Processing...
Processing split: 100%|███████████████████████████████████████████████████████████████████| 473062/473062 [00:14<00:00, 32212.67it/s]
Processing: 100%|████████████████████████████████████████████████████████████████████████| 438507/438507 [00:00<00:00, 489343.72it/s]
Done!
Processing...
Processing: 100%|███████████████████████████████████████████████████████████████████████████| 17957/17957 [00:01<00:00, 15913.02it/s]
Done!
Processing...
Processing: 100%|███████████████████████████████████████████████████████████████████████████| 16598/16598 [00:01<00:00, 11169.99it/s]
Done!
Load split
Load split
epoch: 000, step: 0, train: 26.981, valid: 19.407, train_acc: 0.079, valid_acc: 0.090
epoch: 001, step: 0, train: 17.997, valid: 18.147, train_acc: 0.107, valid_acc: 0.121
epoch: 002, step: 0, train: 16.041, valid: 16.945, train_acc: 0.129, valid_acc: 0.134
epoch: 003, step: 0, train: 13.906, valid: 16.862, train_acc: 0.168, valid_acc: 0.146
epoch: 004, step: 0, train: 11.576, valid: 18.258, train_acc: 0.229, valid_acc: 0.139
epoch: 005, step: 0, train: 8.895, valid: 19.604, train_acc: 0.322, valid_acc: 0.138
epoch: 006, step: 0, train: 6.411, valid: 26.347, train_acc: 0.440, valid_acc: 0.124
epoch: 007, step: 0, train: 4.690, valid: 28.591, train_acc: 0.548, valid_acc: 0.136
epoch: 008, step: 0, train: 3.681, valid: 31.862, train_acc: 0.630, valid_acc: 0.126
epoch: 009, step: 0, train: 2.990, valid: 35.331, train_acc: 0.697, valid_acc: 0.124
epoch: 010, step: 0, train: 2.543, valid: 37.162, train_acc: 0.745, valid_acc: 0.118
epoch: 011, step: 0, train: 2.229, valid: 38.115, train_acc: 0.785, valid_acc: 0.126
epoch: 012, step: 0, train: 2.027, valid: 39.859, train_acc: 0.815, valid_acc: 0.113
epoch: 013, step: 0, train: 1.846, valid: 38.634, train_acc: 0.843, valid_acc: 0.118
epoch: 014, step: 0, train: 1.742, valid: 40.913, train_acc: 0.860, valid_acc: 0.131
epoch: 015, step: 0, train: 1.644, valid: 38.049, train_acc: 0.878, valid_acc: 0.118
epoch: 016, step: 0, train: 1.570, valid: 39.292, train_acc: 0.895, valid_acc: 0.123
epoch: 017, step: 0, train: 1.511, valid: 39.023, train_acc: 0.906, valid_acc: 0.119
epoch: 018, step: 0, train: 1.469, valid: 35.645, train_acc: 0.914, valid_acc: 0.121
epoch: 019, step: 0, train: 1.407, valid: 36.872, train_acc: 0.929, valid_acc: 0.109
epoch: 020, step: 0, train: 1.407, valid: 35.574, train_acc: 0.927, valid_acc: 0.124
epoch: 021, step: 0, train: 1.366, valid: 36.755, train_acc: 0.938, valid_acc: 0.123
epoch: 022, step: 0, train: 1.324, valid: 39.198, train_acc: 0.947, valid_acc: 0.111
epoch: 023, step: 0, train: 1.336, valid: 37.151, train_acc: 0.943, valid_acc: 0.132
epoch: 024, step: 0, train: 1.307, valid: 33.625, train_acc: 0.949, valid_acc: 0.124
epoch: 025, step: 0, train: 1.269, valid: 34.207, train_acc: 0.960, valid_acc: 0.122
epoch: 026, step: 0, train: 1.255, valid: 31.263, train_acc: 0.963, valid_acc: 0.141
epoch: 027, step: 0, train: 1.260, valid: 32.474, train_acc: 0.960, valid_acc: 0.139
epoch: 028, step: 0, train: 1.246, valid: 30.826, train_acc: 0.964, valid_acc: 0.154
epoch: 029, step: 0, train: 1.230, valid: 31.579, train_acc: 0.967, valid_acc: 0.132
epoch: 030, step: 0, train: 1.235, valid: 29.492, train_acc: 0.965, valid_acc: 0.156
epoch: 031, step: 0, train: 1.227, valid: 29.500, train_acc: 0.967, valid_acc: 0.141
epoch: 032, step: 0, train: 1.214, valid: 30.314, train_acc: 0.971, valid_acc: 0.156
epoch: 033, step: 0, train: 1.190, valid: 32.563, train_acc: 0.976, valid_acc: 0.168
epoch: 034, step: 0, train: 1.198, valid: 29.133, train_acc: 0.973, valid_acc: 0.157
epoch: 035, step: 0, train: 1.188, valid: 32.391, train_acc: 0.976, valid_acc: 0.145
epoch: 036, step: 0, train: 1.199, valid: 28.737, train_acc: 0.973, valid_acc: 0.173
epoch: 037, step: 0, train: 1.204, valid: 24.798, train_acc: 0.970, valid_acc: 0.202
epoch: 038, step: 0, train: 1.197, valid: 23.723, train_acc: 0.974, valid_acc: 0.216
epoch: 039, step: 0, train: 1.175, valid: 22.979, train_acc: 0.979, valid_acc: 0.216
epoch: 040, step: 0, train: 1.189, valid: 23.653, train_acc: 0.974, valid_acc: 0.228
epoch: 041, step: 0, train: 1.179, valid: 23.006, train_acc: 0.976, valid_acc: 0.231
epoch: 042, step: 0, train: 1.172, valid: 21.775, train_acc: 0.979, valid_acc: 0.238
epoch: 043, step: 0, train: 1.181, valid: 23.155, train_acc: 0.977, valid_acc: 0.239
epoch: 044, step: 0, train: 1.181, valid: 20.401, train_acc: 0.978, valid_acc: 0.255
epoch: 045, step: 0, train: 1.160, valid: 20.502, train_acc: 0.983, valid_acc: 0.254
epoch: 046, step: 0, train: 1.164, valid: 20.080, train_acc: 0.981, valid_acc: 0.273
epoch: 047, step: 0, train: 1.163, valid: 21.548, train_acc: 0.981, valid_acc: 0.262
epoch: 048, step: 0, train: 1.168, valid: 23.360, train_acc: 0.979, valid_acc: 0.263
epoch: 049, step: 0, train: 1.166, valid: 19.664, train_acc: 0.981, valid_acc: 0.282
Average Epoch Time: 104.2671s
Median Epoch Time: 105.9803s
Total Program Runtime: 9090.4755s
test: 17.504, test_acc: 0.304

Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@puririshi98
Copy link
Contributor

left comments on the existing threads, please address and I will be ready to merge once comments addressed and CI is passing

@xnuohz
Copy link
Contributor Author

xnuohz commented Jun 12, 2025

@puririshi98 new ci tests work fine on 25.05 pyg container
@jdhenaos check if your environment is installed correctly
image

@jdhenaos
Copy link
Contributor

@xnuohz thanks for double checking!

I have reinstalled the Docker image and rerun the test. However, the probe for spmm persists. I am clueless about what I am doing wrong. On the other hand, protein_mpnn worked.

root@246f883b8ec4:/workspace# pytest test/datasets/test_protein_mpnn_dataset.py test/nn/models/test_protein_mpnn.py test/utils/test_spmm.py 
========================================= test session starts ==========================================
platform linux -- Python 3.12.3, pytest-8.1.1, pluggy-1.6.0 -- /usr/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/workspace/.hypothesis/examples'))
rootdir: /workspace
configfile: pyproject.toml
plugins: cov-6.1.1, anyio-4.9.0, hypothesis-6.130.8, flakefinder-1.1.0, xdist-3.6.1, shard-0.1.2, rerunfailures-15.1, xdoctest-1.0.2, typeguard-4.3.0
collected 44 items                                                                                     
Running 44 items in this shard: test/datasets/test_protein_mpnn_dataset.py::test_protein_mpnn_dataset, test/nn/models/test_protein_mpnn.py::test_protein_mpnn, test/utils/test_spmm.py::test_spmm_basic[sum-cpu], test/utils/test_spmm.py::test_spmm_basic[sum-cuda:0], test/utils/test_spmm.py::test_spmm_basic[mean-cpu], test/utils/test_spmm.py::test_spmm_basic[mean-cuda:0], test/utils/test_spmm.py::test_spmm_reduce[min-cpu], test/utils/test_spmm.py::test_spmm_reduce[min-cuda:0], test/utils/test_spmm.py::test_spmm_reduce[max-cpu], test/utils/test_spmm.py::test_spmm_reduce[max-cuda:0], test/utils/test_spmm.py::test_spmm_layout[sum-layout0-cpu], test/utils/test_spmm.py::test_spmm_layout[sum-layout0-cuda:0], test/utils/test_spmm.py::test_spmm_layout[sum-layout1-cpu], test/utils/test_spmm.py::test_spmm_layout[sum-layout1-cuda:0], test/utils/test_spmm.py::test_spmm_layout[sum-layout2-cpu], test/utils/test_spmm.py::test_spmm_layout[sum-layout2-cuda:0], test/utils/test_spmm.py::test_spmm_layout[mean-layout0-cpu], test/utils/test_spmm.py::test_spmm_layout[mean-layout0-cuda:0], test/utils/test_spmm.py::test_spmm_layout[mean-layout1-cpu], test/utils/test_spmm.py::test_spmm_layout[mean-layout1-cuda:0], test/utils/test_spmm.py::test_spmm_layout[mean-layout2-cpu], test/utils/test_spmm.py::test_spmm_layout[mean-layout2-cuda:0], test/utils/test_spmm.py::test_spmm_layout[min-layout0-cpu], test/utils/test_spmm.py::test_spmm_layout[min-layout0-cuda:0], test/utils/test_spmm.py::test_spmm_layout[min-layout1-cpu], test/utils/test_spmm.py::test_spmm_layout[min-layout1-cuda:0], test/utils/test_spmm.py::test_spmm_layout[min-layout2-cpu], test/utils/test_spmm.py::test_spmm_layout[min-layout2-cuda:0], test/utils/test_spmm.py::test_spmm_layout[max-layout0-cpu], test/utils/test_spmm.py::test_spmm_layout[max-layout0-cuda:0], test/utils/test_spmm.py::test_spmm_layout[max-layout1-cpu], test/utils/test_spmm.py::test_spmm_layout[max-layout1-cuda:0], test/utils/test_spmm.py::test_spmm_layout[max-layout2-cpu], test/utils/test_spmm.py::test_spmm_layout[max-layout2-cuda:0], test/utils/test_spmm.py::test_spmm_jit[sum], test/utils/test_spmm.py::test_spmm_jit[mean], test/utils/test_spmm.py::test_spmm_edge_index[sum-cpu], test/utils/test_spmm.py::test_spmm_edge_index[sum-cuda:0], test/utils/test_spmm.py::test_spmm_edge_index[mean-cpu], test/utils/test_spmm.py::test_spmm_edge_index[mean-cuda:0], test/utils/test_spmm.py::test_spmm_edge_index[min-cpu], test/utils/test_spmm.py::test_spmm_edge_index[min-cuda:0], test/utils/test_spmm.py::test_spmm_edge_index[max-cpu], test/utils/test_spmm.py::test_spmm_edge_index[max-cuda:0]

test/datasets/test_protein_mpnn_dataset.py::test_protein_mpnn_dataset Load split
Processing: 100%|███████████████████████████████████████████| 438572/438572 [00:02<00:00, 158569.97it/s]
PASSED
test/nn/models/test_protein_mpnn.py::test_protein_mpnn SKIPPED (Package torch_cluster not found)
test/utils/test_spmm.py::test_spmm_basic[sum-cpu] PASSED
test/utils/test_spmm.py::test_spmm_basic[sum-cuda:0] FAILED
test/utils/test_spmm.py::test_spmm_basic[mean-cpu] PASSED
test/utils/test_spmm.py::test_spmm_basic[mean-cuda:0] FAILED
test/utils/test_spmm.py::test_spmm_reduce[min-cpu] PASSED
test/utils/test_spmm.py::test_spmm_reduce[min-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_reduce[max-cpu] PASSED
test/utils/test_spmm.py::test_spmm_reduce[max-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[sum-layout0-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[sum-layout0-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[sum-layout1-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[sum-layout1-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[sum-layout2-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[sum-layout2-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[mean-layout0-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[mean-layout0-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[mean-layout1-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[mean-layout1-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[mean-layout2-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[mean-layout2-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[min-layout0-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[min-layout0-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[min-layout1-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[min-layout1-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[min-layout2-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[min-layout2-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[max-layout0-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[max-layout0-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[max-layout1-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[max-layout1-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_layout[max-layout2-cpu] PASSED
test/utils/test_spmm.py::test_spmm_layout[max-layout2-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_jit[sum] PASSED
test/utils/test_spmm.py::test_spmm_jit[mean] PASSED
test/utils/test_spmm.py::test_spmm_edge_index[sum-cpu] PASSED
test/utils/test_spmm.py::test_spmm_edge_index[sum-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_edge_index[mean-cpu] PASSED
test/utils/test_spmm.py::test_spmm_edge_index[mean-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_edge_index[min-cpu] PASSED
test/utils/test_spmm.py::test_spmm_edge_index[min-cuda:0] PASSED
test/utils/test_spmm.py::test_spmm_edge_index[max-cpu] PASSED
test/utils/test_spmm.py::test_spmm_edge_index[max-cuda:0] PASSED

=============================================== FAILURES ===============================================
_____________________________________ test_spmm_basic[sum-cuda:0] ______________________________________

device = device(type='cuda', index=0), reduce = 'sum'

    @withCUDA
    @pytest.mark.parametrize('reduce', ['sum', 'mean'])
    def test_spmm_basic(device, reduce):
        src = torch.randn(5, 4, device=device)
        other = torch.randn(4, 8, device=device)
    
        out1 = (src @ other) / (src.size(1) if reduce == 'mean' else 1)
        out2 = spmm(src.to_sparse_csr(), other, reduce=reduce)
        assert out1.size() == (5, 8)
>       assert torch.allclose(out1, out2, atol=1e-6)
E       AssertionError: assert False
E        +  where False = <built-in method allclose of type object at 0x72d3ee1842a0>(tensor([[ 0.2263,  0.5490,  0.4093,  0.6030,  0.7371, -0.4968, -0.4992,  0.1147],\n        [ 0.3482, -0.7796,  0.1877, -0.6512,  0.0834, -0.9297,  0.3539,  1.9510],\n        [ 0.7085,  0.5063,  1.1544,  0.1988,  1.3384, -1.5084, -0.8598,  0.9801],\n        [ 0.4680,  0.6143, -1.0471, -2.4450,  1.0119, -1.0227,  0.8287,  1.4525],\n        [-0.5595,  0.7819, -2.7147,  5.6426,  1.4819, -0.3322,  1.3007,  4.8081]],\n       device='cuda:0'), tensor([[ 0.2263,  0.5490,  0.4094,  0.6029,  0.7368, -0.4966, -0.4993,  0.1147],\n        [ 0.3482, -0.7798,  0.1878, -0.6511,  0.0837, -0.9296,  0.3540,  1.9513],\n        [ 0.7086,  0.5063,  1.1549,  0.1991,  1.3383, -1.5082, -0.8600,  0.9802],\n        [ 0.4679,  0.6143, -1.0472, -2.4445,  1.0120, -1.0224,  0.8288,  1.4523],\n        [-0.5594,  0.7830, -2.7147,  5.6439,  1.4836, -0.3322,  1.3001,  4.8078]],\n       device='cuda:0'), atol=1e-06)
E        +    where <built-in method allclose of type object at 0x72d3ee1842a0> = torch.allclose

test/utils/test_spmm.py:25: AssertionError
_____________________________________ test_spmm_basic[mean-cuda:0] _____________________________________

device = device(type='cuda', index=0), reduce = 'mean'

    @withCUDA
    @pytest.mark.parametrize('reduce', ['sum', 'mean'])
    def test_spmm_basic(device, reduce):
        src = torch.randn(5, 4, device=device)
        other = torch.randn(4, 8, device=device)
    
        out1 = (src @ other) / (src.size(1) if reduce == 'mean' else 1)
        out2 = spmm(src.to_sparse_csr(), other, reduce=reduce)
        assert out1.size() == (5, 8)
>       assert torch.allclose(out1, out2, atol=1e-6)
E       AssertionError: assert False
E        +  where False = <built-in method allclose of type object at 0x72d3ee1842a0>(tensor([[-0.1546, -0.0307, -0.3225, -0.0970,  0.2301,  0.5281,  0.0804,  0.1057],\n        [ 0.1258, -0.0055,  0.1565, -0.4203, -0.2013, -0.4148, -0.5455, -0.2861],\n        [-0.5532, -0.4884, -0.3228,  0.3831,  0.1442,  0.7102,  0.8012,  0.7611],\n        [-0.5898, -0.0525, -0.7840, -0.3497,  0.6824,  1.4893,  0.0843,  0.2738],\n        [ 0.1357, -0.2766, -0.1663,  0.0844, -0.1866, -0.0933,  0.2581,  0.1851]],\n       device='cuda:0'), tensor([[-0.1548, -0.0308, -0.3225, -0.0970,  0.2301,  0.5281,  0.0804,  0.1059],\n        [ 0.1259, -0.0053,  0.1566, -0.4202, -0.2012, -0.4148, -0.5457, -0.2861],\n        [-0.5532, -0.4883, -0.3228,  0.3830,  0.1441,  0.7099,  0.8012,  0.7609],\n        [-0.5900, -0.0525, -0.7839, -0.3498,  0.6823,  1.4891,  0.0842,  0.2739],\n        [ 0.1357, -0.2766, -0.1662,  0.0845, -0.1866, -0.0933,  0.2581,  0.1851]],\n       device='cuda:0'), atol=1e-06)
E        +    where <built-in method allclose of type object at 0x72d3ee1842a0> = torch.allclose

test/utils/test_spmm.py:25: AssertionError
======================================= short test summary info ========================================
FAILED test/utils/test_spmm.py::test_spmm_basic[sum-cuda:0] - AssertionError: assert False
FAILED test/utils/test_spmm.py::test_spmm_basic[mean-cuda:0] - AssertionError: assert False
=============================== 2 failed, 41 passed, 1 skipped in 3.78s ================================

@puririshi98
Copy link
Contributor

puririshi98 commented Jun 13, 2025

my advice would be to run the full unit tests inside the latest nvidia container everytime this branch needs an update from master branch, and when you get clean passing lmk ill merge once your local tests and the github website CI are both green

@jdhenaos
Copy link
Contributor

@puririshi98 @xnuohz

I was running a new round of CI tests. However, I encountered an error in moleculeGPT that appears to be more related to the LLM model used than the code itself. This is the unique error I am getting. Am I ignoring something?:

root@77489883e7c5:/workspace# pytest test/datasets/test_molecule_gpt_dataset.py 
================================================================================= test session starts ==================================================================================
platform linux -- Python 3.12.3, pytest-8.1.1, pluggy-1.6.0 -- /usr/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/workspace/.hypothesis/examples'))
rootdir: /workspace
configfile: pyproject.toml
plugins: cov-6.1.1, anyio-4.9.0, hypothesis-6.130.8, flakefinder-1.1.0, xdist-3.6.1, shard-0.1.2, rerunfailures-15.1, xdoctest-1.0.2, typeguard-4.3.0
collected 1 item                                                                                                                                                                       
Running 1 items in this shard: test/datasets/test_molecule_gpt_dataset.py::test_molecule_gpt_dataset

test/datasets/test_molecule_gpt_dataset.py::test_molecule_gpt_dataset The length of target_CID_list: 8329
  0%|                                                                                                                                                          | 0/4215 [00:00<?, ?it/s][16:57:48] WARNING: not removing hydrogen atom without neighbors
 56%|██████████████████████████████████████████████████████████████████████████████▍                                                              | 2346/4215 [00:00<00:00, 8049.19it/s][16:57:48] WARNING: not removing hydrogen atom without neighbors
[16:57:48] WARNING: not removing hydrogen atom without neighbors
 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                   | 3151/4215 [00:00<00:00, 7962.51it/s][16:57:48] WARNING: not removing hydrogen atom without neighbors
[16:57:48] WARNING: not removing hydrogen atom without neighbors
[16:57:48] WARNING: not removing hydrogen atom without neighbors
 94%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████         | 3948/4215 [00:00<00:00, 7136.67it/s][16:57:48] WARNING: not removing hydrogen atom without neighbors
[16:57:48] WARNING: not removing hydrogen atom without neighbors
[16:57:48] WARNING: not removing hydrogen atom without neighbors
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4215/4215 [00:00<00:00, 7114.91it/s]
block id: 1 with 0 valid SDF file
In total: 4215 molecules
Setting up 'TinyLlama/TinyLlama-1.1B-Chat-v0.1' with configuration: {'revision': 'main'}
FAILED

======================================================================================= FAILURES =======================================================================================
______________________________________________________________________________ test_molecule_gpt_dataset _______________________________________________________________________________
DeprecationWarning: Type google._upb._message.MessageMapContainer uses PyType_Spec with a metaclass that has custom tp_new. This is deprecated and will no longer be allowed in Python 3.14.

The above exception was the direct cause of the following exception:

    @onlyOnline
    @withPackage('transformers', 'sentencepiece', 'accelerate', 'rdkit')
    def test_molecule_gpt_dataset():
>       dataset = MoleculeGPTDataset(root='./data/MoleculeGPT')

test/datasets/test_molecule_gpt_dataset.py:8: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.12/dist-packages/torch_geometric/datasets/molecule_gpt_dataset.py:220: in __init__
    super().__init__(root, transform, pre_transform, pre_filter,
/usr/local/lib/python3.12/dist-packages/torch_geometric/data/in_memory_dataset.py:81: in __init__
    super().__init__(root, transform, pre_transform, pre_filter, log,
/usr/local/lib/python3.12/dist-packages/torch_geometric/data/dataset.py:115: in __init__
    self._process()
/usr/local/lib/python3.12/dist-packages/torch_geometric/data/dataset.py:262: in _process
    self.process()
/usr/local/lib/python3.12/dist-packages/torch_geometric/datasets/molecule_gpt_dataset.py:430: in process
    llm = LLM(
/usr/local/lib/python3.12/dist-packages/torch_geometric/nn/nlp/llm.py:87: in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(
/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py:1013: in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py:2025: in from_pretrained
    return cls._from_pretrained(
/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py:2278: in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
/usr/local/lib/python3.12/dist-packages/transformers/models/llama/tokenization_llama.py:171: in __init__
    self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
/usr/local/lib/python3.12/dist-packages/transformers/models/llama/tokenization_llama.py:203: in get_spm_processor
    model_pb2 = import_protobuf(f"The new behaviour of {self.__class__.__name__} (with `self.legacy = False`)")
/usr/local/lib/python3.12/dist-packages/transformers/convert_slow_tokenizer.py:37: in import_protobuf
    from sentencepiece import sentencepiece_model_pb2
/usr/local/lib/python3.12/dist-packages/sentencepiece/sentencepiece_model_pb2.py:5: in <module>
    from google.protobuf.internal import builder as _builder
/usr/local/lib/python3.12/dist-packages/google/protobuf/internal/builder.py:41: in <module>
    from google.protobuf.internal import python_message
/usr/local/lib/python3.12/dist-packages/google/protobuf/internal/python_message.py:59: in <module>
    from google.protobuf.internal import api_implementation
/usr/local/lib/python3.12/dist-packages/google/protobuf/internal/api_implementation.py:74: in <module>
    if _CanImport('google._upb._message'):
/usr/local/lib/python3.12/dist-packages/google/protobuf/internal/api_implementation.py:64: in _CanImport
    mod = importlib.import_module(mod_name)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

name = 'google._upb._message', package = None

    def import_module(name, package=None):
        """Import a module.
    
        The 'package' argument is required when performing a relative import. It
        specifies the package to use as the anchor point from which to resolve the
        relative import to an absolute import.
    
        """
        level = 0
        if name.startswith('.'):
            if not package:
                raise TypeError("the 'package' argument is required to perform a "
                                f"relative import for {name!r}")
            for character in name:
                if character != '.':
                    break
                level += 1
>       return _bootstrap._gcd_import(name[level:], package, level)
E       SystemError: <class 'DeprecationWarning'> returned a result with an exception set

/usr/lib/python3.12/importlib/__init__.py:90: SystemError
=============================================================================== short test summary info ================================================================================
FAILED test/datasets/test_molecule_gpt_dataset.py::test_molecule_gpt_dataset - SystemError: <class 'DeprecationWarning'> returned a result with an exception set
================================================================================== 1 failed in 2.27s ===================================================================================

@puririshi98
Copy link
Contributor

@puririshi98 @xnuohz

I was running a new round of CI tests. However, I encountered an error in moleculeGPT that appears to be more related to the LLM model used than the code itself. This is the unique error I am getting. Am I ignoring something?:

root@77489883e7c5:/workspace# pytest test/datasets/test_molecule_gpt_dataset.py 
================================================================================= test session starts ==================================================================================
platform linux -- Python 3.12.3, pytest-8.1.1, pluggy-1.6.0 -- /usr/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/workspace/.hypothesis/examples'))
rootdir: /workspace
configfile: pyproject.toml
plugins: cov-6.1.1, anyio-4.9.0, hypothesis-6.130.8, flakefinder-1.1.0, xdist-3.6.1, shard-0.1.2, rerunfailures-15.1, xdoctest-1.0.2, typeguard-4.3.0
collected 1 item                                                                                                                                                                       
Running 1 items in this shard: test/datasets/test_molecule_gpt_dataset.py::test_molecule_gpt_dataset

test/datasets/test_molecule_gpt_dataset.py::test_molecule_gpt_dataset The length of target_CID_list: 8329
  0%|                                                                                                                                                          | 0/4215 [00:00<?, ?it/s][16:57:48] WARNING: not removing hydrogen atom without neighbors
 56%|██████████████████████████████████████████████████████████████████████████████▍                                                              | 2346/4215 [00:00<00:00, 8049.19it/s][16:57:48] WARNING: not removing hydrogen atom without neighbors
[16:57:48] WARNING: not removing hydrogen atom without neighbors
 75%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                   | 3151/4215 [00:00<00:00, 7962.51it/s][16:57:48] WARNING: not removing hydrogen atom without neighbors
[16:57:48] WARNING: not removing hydrogen atom without neighbors
[16:57:48] WARNING: not removing hydrogen atom without neighbors
 94%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████         | 3948/4215 [00:00<00:00, 7136.67it/s][16:57:48] WARNING: not removing hydrogen atom without neighbors
[16:57:48] WARNING: not removing hydrogen atom without neighbors
[16:57:48] WARNING: not removing hydrogen atom without neighbors
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4215/4215 [00:00<00:00, 7114.91it/s]
block id: 1 with 0 valid SDF file
In total: 4215 molecules
Setting up 'TinyLlama/TinyLlama-1.1B-Chat-v0.1' with configuration: {'revision': 'main'}
FAILED

======================================================================================= FAILURES =======================================================================================
______________________________________________________________________________ test_molecule_gpt_dataset _______________________________________________________________________________
DeprecationWarning: Type google._upb._message.MessageMapContainer uses PyType_Spec with a metaclass that has custom tp_new. This is deprecated and will no longer be allowed in Python 3.14.

The above exception was the direct cause of the following exception:

    @onlyOnline
    @withPackage('transformers', 'sentencepiece', 'accelerate', 'rdkit')
    def test_molecule_gpt_dataset():
>       dataset = MoleculeGPTDataset(root='./data/MoleculeGPT')

test/datasets/test_molecule_gpt_dataset.py:8: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.12/dist-packages/torch_geometric/datasets/molecule_gpt_dataset.py:220: in __init__
    super().__init__(root, transform, pre_transform, pre_filter,
/usr/local/lib/python3.12/dist-packages/torch_geometric/data/in_memory_dataset.py:81: in __init__
    super().__init__(root, transform, pre_transform, pre_filter, log,
/usr/local/lib/python3.12/dist-packages/torch_geometric/data/dataset.py:115: in __init__
    self._process()
/usr/local/lib/python3.12/dist-packages/torch_geometric/data/dataset.py:262: in _process
    self.process()
/usr/local/lib/python3.12/dist-packages/torch_geometric/datasets/molecule_gpt_dataset.py:430: in process
    llm = LLM(
/usr/local/lib/python3.12/dist-packages/torch_geometric/nn/nlp/llm.py:87: in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(
/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py:1013: in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py:2025: in from_pretrained
    return cls._from_pretrained(
/usr/local/lib/python3.12/dist-packages/transformers/tokenization_utils_base.py:2278: in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
/usr/local/lib/python3.12/dist-packages/transformers/models/llama/tokenization_llama.py:171: in __init__
    self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
/usr/local/lib/python3.12/dist-packages/transformers/models/llama/tokenization_llama.py:203: in get_spm_processor
    model_pb2 = import_protobuf(f"The new behaviour of {self.__class__.__name__} (with `self.legacy = False`)")
/usr/local/lib/python3.12/dist-packages/transformers/convert_slow_tokenizer.py:37: in import_protobuf
    from sentencepiece import sentencepiece_model_pb2
/usr/local/lib/python3.12/dist-packages/sentencepiece/sentencepiece_model_pb2.py:5: in <module>
    from google.protobuf.internal import builder as _builder
/usr/local/lib/python3.12/dist-packages/google/protobuf/internal/builder.py:41: in <module>
    from google.protobuf.internal import python_message
/usr/local/lib/python3.12/dist-packages/google/protobuf/internal/python_message.py:59: in <module>
    from google.protobuf.internal import api_implementation
/usr/local/lib/python3.12/dist-packages/google/protobuf/internal/api_implementation.py:74: in <module>
    if _CanImport('google._upb._message'):
/usr/local/lib/python3.12/dist-packages/google/protobuf/internal/api_implementation.py:64: in _CanImport
    mod = importlib.import_module(mod_name)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

name = 'google._upb._message', package = None

    def import_module(name, package=None):
        """Import a module.
    
        The 'package' argument is required when performing a relative import. It
        specifies the package to use as the anchor point from which to resolve the
        relative import to an absolute import.
    
        """
        level = 0
        if name.startswith('.'):
            if not package:
                raise TypeError("the 'package' argument is required to perform a "
                                f"relative import for {name!r}")
            for character in name:
                if character != '.':
                    break
                level += 1
>       return _bootstrap._gcd_import(name[level:], package, level)
E       SystemError: <class 'DeprecationWarning'> returned a result with an exception set

/usr/lib/python3.12/importlib/__init__.py:90: SystemError
=============================================================================== short test summary info ================================================================================
FAILED test/datasets/test_molecule_gpt_dataset.py::test_molecule_gpt_dataset - SystemError: <class 'DeprecationWarning'> returned a result with an exception set
================================================================================== 1 failed in 2.27s ===================================================================================

this seems unrelated to us, hopefully it goes away with the new 25.07 container coming end of july, or you can file an issue with transformers and see if they fix it

@xnuohz
Copy link
Contributor Author

xnuohz commented Jul 2, 2025

hi @puririshi98 CI becomes green, can you merge?

Copy link
Contributor

@puririshi98 puririshi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm now

@puririshi98 puririshi98 merged commit 237d077 into pyg-team:master Jul 2, 2025
19 checks passed
@xnuohz xnuohz deleted the models/proteinmpnn branch July 2, 2025 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ProtienMPNN (inverse AlphaFold): Dataset+Model+Unit tests+Example [Community Sprint] GNNs<>LLMs 🚀

3 participants