Skip to content

[Roadmap] PyG for Recommendation 🚀 #8452

@rusty1s

Description

@rusty1s

🚀 The feature, motivation and pitch

This roadmap aims to bring better support for recommendation tasks to PyG.

Currently, all/most of our link prediction models are trained and evaluated using binary classification metrics. However, this usually requires that we have a set of candidates in advance, from which we can then infer the existence of links. This is not necessarily practical, since in most cases, we want to find the top-k most likely links from the full set of O(N^2) pairs.

While training can still be done via negative sampling and binary classification, this roadmap resolves around bringing better support for link prediction evaluation into PyG, with the following end-to-end pipeline:

  1. Embed all source and destination nodes
  2. Use "Maximum Inner Product Search" (MIPS) to find the top-k most likely links (via MIPSKNNIndex)
  3. Evaluate using common metrics for recommendation, e.g., map@k, precision@k, recall@k, f1@k, ndcg@k.

Metrics

We need to support recommendation metrics, which can be updated and computed in a mini-batch fashion. A related issue can be found here. Its interface can/should follow the torchmetrics.Metric interface, e.g.:

class LinkPredMetric(torchmetrics.Metric):
    def __init__(self, k: int):
        pass

    def update(self, top_k_pred_mat: Tensor, edge_label_index: Tensor):
        pass

    def compute(self):
        pass

where top_k_pred_mat holds the top-k indices for each left-hand-side (LHS) entity, and edge_label_index holds the ground-truth information as a [2, num_targets] matrix.

Examples

With this, we can build one or more clear and descriptive examples of how to leverage PyG for recommendation.

  • Select and implement one or two datasets commonly used for recommendation
  • Add exclusion logic to MIPSKNNIndex
  • Build an example that implements this pipeline
  • Write a tutorial about recommendation in PyG
  • Advanced: Combine PyG's recommendation capabilities with its temporal GNN support (see [Roadmap] Temporal Graph Support 🚀 #3230)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions