Skip to content

[Feat] Add validate_graph_components() for pre-export structural consistency checks #97

@prajwal-tech07

Description

@prajwal-tech07

Summary

Add a validate_graph_components() function that validates the structural consistency of constructed networkx.DiGraph graph components before they are serialized to disk. This catches construction bugs at the source rather than downstream in neural-lam.

Problem

Currently, graph construction bugs are only discovered when:

  • to_pyg() tries to serialize the graph and hits unexpected shapes/missing attributes
  • neural-lam's load_graph() fails with cryptic FileNotFoundError or TypeError (as documented in the format mismatch audit in neural-lam#339)
  • Or worse — they pass silently and produce wrong training results

Recent examples of bugs that would have been caught earlier:

With multiple mesh_layout variants being added (rectilinear in #81, triangular in #92, icosahedral in #76, prebuilt in #91), the surface area for construction bugs is growing. We need a systematic way to validate graph components.

Relationship to existing work

  • neural-lam#323 validates the on-disk .pt tensor format after export. This proposal validates the NetworkX DiGraph objects before export — they are complementary.
  • Assert that all nodes are in g2m #42 ("Assert that all nodes are in g2m") is a specific instance of one check. This proposal generalizes it into a reusable validation framework alongside other checks.
  • Strengthen test_save_to_pyg with output assertions and pytest.skip #88 ("Strengthen test_save_to_pyg with output assertions") focuses on the serialization test. This proposal tests the graph objects themselves.
  • neural-lam#339 (the wmg↔neural-lam bridge RFC) identifies that format mismatches between the two repos are a key pain point. Catching structural issues before serialization would prevent many of these mismatches from occurring.

Proposed API

from weather_model_graphs import validate_graph_components

components = wmg.create.archetype.create_keisler_graph(
    coords=xy, return_components=True
)

report = validate_graph_components(
    components=components,      # {"g2m": DiGraph, "m2m": DiGraph, "m2g": DiGraph}
    hierarchical=False,
    graph_crs=None,             # optional: enables CRS-specific checks
)

report.passed      # True/False
report.summary()   # prints human-readable summary
report.checks      # list of CheckResult(name, passed, details)

Checks to implement

  1. Bidirectional grid coverage (generalizes Assert that all nodes are in g2m #42)
    Every grid node that appears in g2m should also appear in m2g (or be explicitly marked as boundary-only). Flags grid nodes that are "encoded to mesh" but never "decoded from mesh", which would cause silent data loss.

  2. Edge feature consistency
    All edges within a component have the same set of attributes (len, vdiff, component, etc.) and consistent dimensions. For example, vdiff should be 2D for projected CRS and 3D for geographic CRS — not mixed within a single component.

  3. Mesh hierarchy completeness (hierarchical only)
    For hierarchical graphs: every level-L mesh node has at least one up edge to level L+1 and one down edge from L+1. No orphan mesh nodes at any level.

  4. No degenerate structures

    • No self-loops (edge from a node to itself)
    • No empty components (a component with zero edges)
    • No disconnected subgraphs within a single component (e.g., mesh nodes that form isolated clusters)
  5. Coordinate sanity
    All node pos attributes are finite (no NaN, no inf), and within reasonable bounds for the declared CRS (e.g., latitude in [-90, 90] for geographic).

  6. Component labeling consistency
    Edge component attributes match the expected values (g2m, m2m, m2g) and are uniform within each subgraph.

Usage in CI / tests

def test_keisler_graph_is_valid():
    components = create_keisler_graph(coords=xy, return_components=True)
    report = validate_graph_components(components, hierarchical=False)
    assert report.passed, report.summary()

def test_hierarchical_graph_is_valid():
    components = create_oskarsson_hierarchical_graph(coords=xy, return_components=True)
    report = validate_graph_components(components, hierarchical=True)
    assert report.passed, report.summary()

This also becomes valuable for validating mesh_layout="prebuilt" (#91) graphs where users provide their own mesh — we can verify it meets structural requirements before attempting serialization.

Implementation plan

  1. Add weather_model_graphs/validation.py with validate_graph_components() and individual check functions
  2. Add corresponding tests in tests/test_validation.py
  3. Optionally integrate as an automatic step in to_pyg() / to_neural_lam() (with a validate=True flag)
  4. Add a brief section to documentation

No new dependencies required — purely operates on networkx.DiGraph objects using existing networkx APIs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions