Skip to content

Improve alignment checks #10251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 28, 2025
Merged

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented Apr 25, 2025

The alignment error

ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions: 'time' (2 conflicting indexes)
Conflicting indexes may occur when
- they relate to different sets of coordinate and/or dimension names
- they don't have the same type
- they may be used to reindex data along common dimensions

was not user-friendly, sometimes too restrictive and is now gone!


The examples in the issues linked above now work, i.e., it is possible to align objects with multiple indexes found along one or more common dimensions as long as either

  • no re-indexing is required along those dimensions
  • Index.reindex_like() called for each index return matching indexers along those dimensions

So this example works too:

>>> ds1 = xr.Dataset(coords={"x": [1, 2], "xb": ("x", [3, 4])}).set_xindex("xb")
>>> ds2 = xr.Dataset(coords={"x": [1], "xb": ("x", [3])}).set_xindex("xb")

>>> xr.align(ds1, ds2, join="inner")
(<xarray.Dataset> Size: 16B
 Dimensions:  (x: 1)
 Coordinates:
   * x        (x) int64 8B 1
   * xb       (x) int64 8B 3
 Data variables:
     *empty*,
 <xarray.Dataset> Size: 16B
 Dimensions:  (x: 1)
 Coordinates:
   * x        (x) int64 8B 1
   * xb       (x) int64 8B 3
 Data variables:
     *empty*)

A more user-friendly error is raised when indexers don't match:

>>> ds3 = xr.Dataset(coords={"x": [1, 3], "xb": ("x", [2, 4])}).set_xindex("xb")

>>> xr.align(ds1, ds3, join="inner")
AlignmentError: cannot reindex or align along dimension 'x' because of conflicting re-indexers returned by multiple indexes
first index: PandasIndex(Index([3, 4], dtype='int64', name='xb'))
second index: PandasIndex(Index([1, 2], dtype='int64', name='x'))

benbovy added 3 commits April 25, 2025 14:52
- Improved error messages (more context)
- Simplified logic
- Removed overly restrictive checks that caused alignment to fail when
  multiple indexes are set along common dimensions
@@ -2543,6 +2544,28 @@ def test_align_indexes(self) -> None:

assert_identical(expected_x2, x2)

def test_align_multiple_indexes_common_dim(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test!

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

This is an important fix, so lets merge ASAP.

The doctest simply needs updating from ValueError to AlignmentError AFAICT

@dcherian dcherian added the plan to merge Final call for comments label Apr 26, 2025
@@ -837,7 +837,7 @@ def align(
>>> a, b = xr.align(x, y, join="exact")
Traceback (most recent call last):
...
AlignmentError: cannot align objects with join='exact' ...
xarray.structure.alignment.AlignmentError: cannot align objects with join='exact' ...
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be cleaner to later have Xarray exceptions defined in their own module xarray.exceptions.AlignmentError (same for MergeError, etc.)

@dcherian dcherian merged commit bd10f9f into pydata:main Apr 28, 2025
32 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Apr 29, 2025
* main:
  Fix convert calendar on non-temporal data in datasets (pydata#10268)
  BinGrouper: reduce indirection (pydata#10270)
  Fix reduction by subset of grouper dimensions (pydata#10258)
  Shorten text repr for ``DataTree`` (pydata#10139)
  Fix benchmarks runners (pydata#10265)
  Fix infinite recursion when calling `np.fix` (pydata#10248)
  BinGrouper: Support setting labels when provided with IntervalIndex (pydata#10259)
  Avoid stacking when grouping by chunked array (pydata#10254)
  Improve alignment checks (pydata#10251)
  Update how-to-add-new-backend.rst (pydata#10240)
  Support extension array indexes (pydata#9671)
  Switch documentation to pydata-sphinx-theme (pydata#8708)
  Bump codecov/codecov-action from 5.4.0 to 5.4.2 in the actions group (pydata#10239)
@benbovy benbovy deleted the improve-alignment-checks branch May 5, 2025 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Alignment fails when multiple indexes are set along one common dimension
2 participants