Skip to content

Conversation

@jesseangelis
Copy link
Contributor

@jesseangelis jesseangelis commented Jul 30, 2025

Summary

This PR adds a new method, connected_components(), to both HeteroData and Data classes in PyTorch Geometric. It identifies and extracts disjoint connected components from a (heterogeneous or homogeneous) graph using a union-find algorithm.

Motivation

While PyG provides convenient utilities such as subgraph, edge_subgraph, and node_type_subgraph, it does not currently offer a built-in method to extract connected components from graphs. This functionality could be useful for:

  • Preprocessing or cleaning sparse or noisy graphs
  • Isolating disconnected structures
  • Enabling mini-batching strategies for large, sparse datasets

This PR provides a general-purpose implementation that complements existing subgraph utilities.

API

components = data.connected_components()
components = hetero_data.connected_components()
  • Returns: A List[Data] or List[HeteroData], with each item corresponding to a connected component.
  • Optional filtering: For heterogeneous graphs, users are encouraged to pre-filter using existing methods like node_type_subgraph or edge_type_subgraph if they wish to limit the types involved.

Highlights

  • Implementation for both Data and HeteroData
  • Union-Find algorithm ensures O($\alpha$(n)), where $\alpha$: Ackermann function
  • Compatible with PyG idioms and retains all feature and edge attributes if present in the connected component

Discussion Points

Based on feedback from the original issue, this version avoids introducing new arguments like allowed_edge_types or allowed_node_types and instead suggests combining the new method with existing filtering utilities.

Happy to further extend or refine the method based on team preferences!

@jesseangelis jesseangelis marked this pull request as draft July 30, 2025 18:11
Copy link
Member

@wsad1 wsad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments.
Will take another look this week.

@jesseangelis jesseangelis changed the title Adding separate() Method to Data and HeteroData for extracting connected components Adding connected_components() Method to Data and HeteroData for extracting connected components Aug 8, 2025
@jesseangelis jesseangelis marked this pull request as ready for review August 8, 2025 06:31
@codecov
Copy link

codecov bot commented Aug 8, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.97%. Comparing base (c211214) to head (b62943d).
⚠️ Report is 139 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10388      +/-   ##
==========================================
- Coverage   86.11%   85.97%   -0.14%     
==========================================
  Files         496      502       +6     
  Lines       33655    35207    +1552     
==========================================
+ Hits        28981    30269    +1288     
- Misses       4674     4938     +264     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wsad1 added 2 commits August 31, 2025 11:52
Refactor hetero data connected components tests for clarity and conciseness.
Copy link
Member

@wsad1 wsad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work @jesseangelis .

@wsad1 wsad1 merged commit f9a20c1 into pyg-team:master Sep 3, 2025
19 checks passed
@jesseangelis jesseangelis deleted the separate-method branch September 4, 2025 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants