Skip to content

Releases: pyg-team/pytorch_geometric

PyG 2.2.0: Accelerations and Scalability

01 Dec 07:31
ca4e5f8

Choose a tag to compare

We are excited to announce the release of PyG 2.2 🎉🎉🎉

PyG 2.2 is the culmination of work from 78 contributors who have worked on features and bug-fixes for a total of over 320 commits since torch-geometric==2.1.0.

Highlights

pyg-lib Integration

We are proud to release and integrate pyg-lib==0.1.0 into PyG, the first stable version of our new low-level Graph Neural Network library to drive all CPU and GPU acceleration needs of PyG (#5330, #5347, #5384, #5388).

You can install pyg-lib as described in our README.md:

pip install pyg-lib -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
import pyg_lib

Once pyg-lib is installed, it will get automatically picked up by PyG, e.g., to accelerate neighborhood sampling routines or to accelerate heterogeneous GNN execution:

  • pyg-lib provides fast and optimized CPU routines to iteratively sample neighbors in homogeneous and heterogeneous graphs, and heavily improves upon the previously used neighborhood sampling techniques utilized in PyG.

Screenshot 2022-11-30 at 08 44 08

  • pyg-lib provides efficient GPU-based routines to parallelize workloads in heterogeneous graphs across different node types and edge types. We achieve this by leveraging type-dependent transformations via NVIDIA CUTLASS integration, which is flexible to implement most heterogeneous GNNs with, and efficient, even for sparse edge types or a large number of different node types.

Screenshot 2022-11-30 at 08 44 38

GraphStore and FeatureStore Abstractions

PyG 2.2 includes numerous primitives to easily integrate with simple paradigms for scalable graph machine learning, enabling users to train GNNs on graphs far larger than the size of their machine's available memory. It does so by introducing simple, easy-to-use, and extensible abstractions of a FeatureStore and a GraphStore that plug directly into existing familiar PyG interfaces (see here for the accompanying tutorial).

feature_store = CustomFeatureStore()
feature_store['paper', 'x', None] = ...  # Add paper features
feature_store['author', 'x', None] = ...  # Add author features

graph_store = CustomGraphStore()
graph_store['edge', 'coo'] = ...  # Add edges in "COO" format

# `CustomGraphSampler` knows how to sample on `CustomGraphStore`:
graph_sampler = CustomGraphSampler(
    graph_store=graph_store,
    num_neighbors=[10, 20],
    ...
)

from torch_geometric.loader import NodeLoader
loader = NodeLoader(
    data=(feature_store, graph_store),
    node_sampler=graph_sampler,
    batch_size=20,
    input_nodes='paper',
)

for batch in loader:
    pass

Data loading and sampling routines are refactored and decomposed into torch_geometric.loader and torch_geometric.sampler modules, respectively (#5563, #5820, #5456, #5457, #5312, #5365, #5402, #5404, #5418).

Optimized and Fused Aggregations

PyG 2.2 further accelerates scatter aggregations based on CPU/GPU and with/without backward computation paths (requires torch>=1.12.0 and torch-scatter>=2.1.0) (#5232, #5241, #5353, #5386, #5399, #6051, #6052).

We also optimized the usage of nn.aggr.MultiAggregation by fusing the computation of multiple aggregations together (see here for more details) (#6036, #6040).

Here are some benchmarking results on PyTorch 1.12 (summed over 1000 runs):

Aggregators Vanilla Fusion
[sum, mean] 0.3325s 0.1996s
[sum, mean, min, max] 0.7139s 0.5037s
[sum, mean, var] 0.6849s 0.3871s
[sum, mean, var, std] 1.0955s 0.3973s

Lastly, we have incorporated "fused" GNN operators via the dgNN package, starting with a FusedGATConv implementation (#5140).

Community Sprint: Type Hints and TorchScript Support

We are running regular community sprints to get our community more involved in building PyG. Whether you are just beginning to use graph learning or have been leveraging GNNs in research or production, the community sprints welcome members of all levels with different types of projects.

We had our first community sprint on 10/12 to fully-incorporate type hints and TorchScript support over the entire code base. The goal was to improve usability and cleanliness of our codebase. We had 20 contributors participating, contributing to 120 type hints within 2 weeks, adding around 2400 lines of code (#5842, #5603, #5659, #5664, #5665, #5666, #5667, #5668, #5669, #5673, #5675, #5673, #5678, #5682, #5683, #5684, #5685, #5687, #5688, #5695, #5699, #5701, #5702, #5703, #5706, #5707, #5710, #5714, #5715, #5716, #5722, #5724, #5725, #5726, #5729, #5730, #5731, #5732, [#5733](https://github.com/pyg-team/pyt...

Read more

PyG 2.1.0: Principled aggregations, link-level and temporal samplers, data pipe support, ...

17 Aug 10:32
07bf02f

Choose a tag to compare

We are excited to announce the release of PyG 2.1.0 🎉🎉🎉

PyG 2.1.0 is the culmination of work from over 60 contributors who have worked on features and bug-fixes for a total of over 320 commits since torch-geometric==2.0.4.

Highlights

Principled Aggregations

See here for the accompanying tutorial.

Aggregation functions play an important role in the message passing framework and the readout functions of Graph Neural Networks. Specifically, many works in the literature (Hamilton et al. (2017), Xu et al. (2018), Corso et al. (2020), Li et al. (2020), Tailor et al. (2021), Bartunov et al. (2022)) demonstrate that the choice of aggregation functions contributes significantly to the representational power and performance of the model.

To facilitate further experimentation and unify the concepts of aggregation within GNNs across both MessagePassing and global readouts, we have made the concept of Aggregation a first-class principle in PyG (#4379, #4522, #4687, #4721, #4731, #4762, #4749, #4779, #4863, #4864, #4865, #4866, #4872, #4927, #4934, #4935, #4957, #4973, #4973, #4986, #4995, #5000, #5021, #5034, #5036, #5039, #4522, #5033, #5085, #5097, #5099, #5104, #5113, #5130, #5098, #5191). As of now, PyG provides support for various aggregations — from simple ones (e.g., mean, max, sum), to advanced ones (e.g., median, var, std), learnable ones (e.g., SoftmaxAggregation, PowerMeanAggregation), and exotic ones (e.g., LSTMAggregation, SortAggregation, EquilibriumAggregation). Furthermore, multiple aggregations can be combined and stacked together:

from torch_geometric.nn import MessagePassing, SoftmaxAggregation

class MyConv(MessagePassing):
    def __init__(self, ...):
        # Combines a set of aggregations and concatenates their results.
        # The interface also supports automatic resolution.
        super().__init__(aggr=['mean', 'std', SoftmaxAggregation(learn=True)])

Link-level Neighbor Loader

We added a new LinkNeighborLoader class for training scalable GNNs that perform edge-level predictions on giant graphs (#4396, #4439, #4441, #4446, #4508, #4509, #4868). LinkNeighborLoader comes with automatic support for both homogeneous and heterogenous data, and supports link prediction via automatic negative sampling as well as edge-level classification and regression models:

from torch_geometric.loader import LinkNeighborLoader

loader = LinkNeighborLoader(
    data,
    num_neighbors=[30] * 2,  # Sample 30 neighbors for each node for 2 iterations
    batch_size=128,  # Use a batch size of 128 for sampling training links
    edge_label_index=data.edge_index,  # Use the entire graph for supervision
    negative_sampling_ratio=1.0,  # Sample negative edges
)

sampled_data = next(iter(loader))
print(sampled_data)
>>> Data(x=[1368, 1433], edge_index=[2, 3103], edge_label_index=[2, 256], edge_label=[256])

Neighborhood Sampling based on Temporal Constraints

Both NeighborLoader and LinkNeighborLoader now support temporal sampling via the time_attr argument (#4025, #4877, #4908, #5137, #5173). If set, temporal sampling will be used such that neighbors are guaranteed to fulfill temporal constraints, i.e. neighbors have an earlier timestamp than the center node:

from torch_geometric.loader import NeighborLoader

data['paper'].time = torch.arange(data['paper'].num_nodes)

loader = NeighborLoader(
    data,
    input_nodes='paper',
    time_attr='time',  # Only sample papers that appeared before the seed paper
    num_neighbors=[30] * 2,
    batch_size=128,
)

Note that this feature requires torch-sparse>=0.6.14.

Functional DataPipes

See here for the accompanying example.

PyG now fully supports data loading using the newly introduced concept of DataPipes in PyTorch for easily constructing flexible and performant data pipelines (#4302, #4345, #4349). PyG provides DataPipe support for batching multiple PyG data objects together and for applying any PyG transform:

datapipe = FileOpener(['SMILES_HIV.csv'])
datapipe = datapipe.parse_csv_as_dict()
datapipe = datapipe.parse_smiles(target_key='HIV_active')
datapipe = datapipe.in_memory_cache()  # Cache graph instances in-memory.
datapipe = datapipe.shuffle()
datapipe = datapipe.batch_graphs(batch_size=32)
datapipe = FileLister([root_dir], masks='*.off', recursive=True)
datapipe = datapipe.read_mesh()
datapipe = datapipe.in_memory_cache()  # Cache graph instances in-memory.
datapipe = datapipe.sample_points(1024)  # Use PyG transforms from here.
datapipe = datapipe.knn_graph(k=8)
datapipe = datapipe.shuffle()
datapipe = datapipe.batch_graphs(batch_size=32)

Breaking Changes

Read more

2.0.4

12 Mar 16:43

Choose a tag to compare

PyG 2.0.4 🎉

A new minor PyG version release, bringing PyTorch 1.11 support to PyG. It further includes a variety of new features and bugfixes:

Features

Datasets

Minor Changes

Bugfixes

Read more

2.0.3

22 Dec 06:49
d47d9cd

Choose a tag to compare

PyG 2.0.3 🎉

A new minor PyG version release, including a variety of new features and bugfixes:

Features

Datasets

Minor Changes

Read more

2.0.2

26 Oct 12:41

Choose a tag to compare

A new minor version release, including further bugfixes, official PyTorch 1.10 support, as well as additional features and operators:

Features

Minor Changes

  • Data.to_homogeneous will now add node_type information to the homogeneous Data object
  • GINEConv now allows to transform edge features automatically in case their dimensionalities do not match (thanks to @CaypoH)
  • OGB_MAG will now add node_year information to paper nodes
  • Entities datasets do now allow the processing of HeteroData objects via the hetero=True option
  • Batch objects can now be batched together to form super batches
  • Added heterogeneous graph support for Center, Constant and LinearTransformation transformations
  • HeteroConv now allows to return "stacked" embeddings
  • The batch vector of a Batch object will now be initialized on the GPU in case other attributes are held in GPU memory

Bugfixes

  • Fixed the num_neighbors argument of NeighborLoader in order to specify an edge-type specific number of neighbors
  • Fixed the collate policy of lists of integers/strings to return nested lists
  • Fixed the Delaunay transformation in case the face attribute is not present in the data
  • Fixed the TGNMemory module to only read from the latest update (thanks to @cwh104504)
  • Fixed the pickle.PicklingError when Batch objects are used in a torch.multiprocessing.manager.Queue() (thanks to @RasmusOrsoe)
  • Fixed an issue with _parent state changing after pickling of Data objects (thanks to @zepx)
  • Fixed the ToUndirected transformation in case the number of edges and nodes are equal (thanks to @lmkmkrcc)
  • Fixed the from_networkx routine in case node-level and edge-level features share the same names
  • Removed the num_nodes warning when creating PairData objects
  • Fixed the initialization of the GeneralMultiLayer module in GraphGym (thanks to @fjulian)
  • Fixed custom model registration in GraphGym
  • Fixed a clash in the run_dir naming of GraphGym (thanks to @fjulian)
  • Includes a fix to prevent a GraphGym crash in case ROC-score is undefined (thanks to @fjulian)
  • Fixed the Batch.from_data_list routine on dataset slices (thanks to @dtortorella)
  • Fixed the MetaPath2Vec model in case there exists isolated nodes
  • Fixed torch_geometric.utils.coalesce with CUDA tensors

2.0.1

16 Sep 07:22

Choose a tag to compare

PyG 2.0.1

This is a minor release, bringing some emergency fixes to PyG 2.0.

Bugfixes

2.0.0

13 Sep 07:48

Choose a tag to compare

PyG 2.0 🎉 🎉 🎉

PyG (PyTorch Geometric) has been moved from my own personal account rusty1s to its own organization account pyg-team to emphasize the ongoing collaboration between TU Dortmund University, Stanford University and many great external contributors. With this, we are releasing PyG 2.0, a new major release that brings sophisticated heterogeneous graph support, GraphGym integration and many other exciting features to PyG.

If you encounter any bugs in this new release, please do not hesitate to create an issue.

Heterogeneous Graph Support

We finally provide full heterogeneous graph support in PyG 2.0. See here for the accompanying tutorial.

Highlights

  • Heterogeneous Graph Storage: Heterogeneous graphs can now be stored in their own dedicated data.HeteroData class (thanks to @yaoyaowd):

    from torch_geometric.data import HeteroData
    
    data = HeteroData()
    
    # Create two node types "paper" and "author" holding a single feature matrix:
    data['paper'].x = torch.randn(num_papers, num_paper_features)
    data['author'].x = torch.randn(num_authors, num_authors_features)
    
    # Create an edge type ("paper", "written_by", "author") holding its graph connectivity:
    data['paper', 'written_by', 'author'].edge_index = ...  # [2, num_edges]

    data.HeteroData behaves similar to a regular homgeneous data.Data object:

    print(data['paper'].num_nodes)
    print(data['paper', 'written_by', 'author'].num_edges)
    data = data.to('cuda')
  • Heterogeneous Mini-Batch Loading: Heterogeneous graphs can be converted to mini-batches for many small and single giant graphs via the loader.DataLoader and loader.NeighborLoader loaders, respectively. These loaders can now handle both homogeneous and heterogeneous graphs:

    from torch_geometric.loader import DataLoader
    
    loader = DataLoader(heterogeneous_graph_dataset, batch_size=32, shuffle=True)
    
    from torch_geometric.loader import NeighborLoader
    
    loader = NeighborLoader(heterogeneous_graph, num_neighbors=[30, 30], batch_size=128,
                            input_nodes=('paper', data['paper'].train_mask), shuffle=True)
  • Heterogeneous Graph Neural Networks: Heterogeneous GNNs can now easily be created from homogeneous ones via nn.to_hetero and nn.to_hetero_with_bases. These processes take an existing GNN model and duplicate their message functions to account for different node and edge types:

    from torch_geometric.nn import SAGEConv, to_hetero
    
    class GNN(torch.nn.Module):
        def __init__(hidden_channels, out_channels):
            super().__init__()
            self.conv1 = SAGEConv((-1, -1), hidden_channels)
            self.conv2 = SAGEConv((-1, -1), out_channels)
    
        def forward(self, x, edge_index):
            x = self.conv1(x, edge_index).relu()
            x = self.conv2(x, edge_index)
            return x
    
    model = GNN(hidden_channels=64, out_channels=dataset.num_classes)
    model = to_hetero(model, data.metadata(), aggr='sum')

Additional Features

Managing Experiments with GraphGym

GraphGym is now officially supported in PyG 2.0 via torch_geometric.graphgym. See here for the accompanying tutorial. Overall, GraphGym is a platform for designing and evaluating Graph Neural Networks from configuration files via a highly modularized pipeline (thanks to @JiaxuanYou):

  1. GraphGym is the perfect place to start learning about standardized GNN implementation and evaluation
  2. GraphGym provides a simple interface to try out thousands of GNN architectures in parallel to find the best design for your specific task
  3. GraphGym lets you easily do hyper-parameter search and visualize what design choices are better

Breaking Changes

Read more

1.7.2

26 Jun 08:50

Choose a tag to compare

Datasets

Bugfixes

1.7.1

17 Jun 08:19

Choose a tag to compare

A minor release that brings PyTorch 1.9.0 and Python 3.9 support to PyTorch Geometric. In case you are in the process of updating to PyTorch 1.9.0, please re-install the external dependencies for PyTorch 1.9.0 as well (torch-scatter and torch-sparse).

Features

Datasets

Issues

1.7.0

09 Apr 08:44

Choose a tag to compare

Major Features

Additional Features

Minor Changes

Datasets

Bugfixes