Skip to content

Alignment of n-dimensional indexes with partially excluded dims #10293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

benbovy
Copy link
Member

@benbovy benbovy commented May 7, 2025

Support checking exact alignment of indexes that use multiple dimensions in the cases where some of those dimensions are included in alignment while others are excluded.

Add the exclude_dims keyword argument to Index.equals() (still support old signature with FutureWarning).

Also fixed a bug: indexes associated with scalar coordinates were ignored during alignment.

TODO:

  • add exclude_dims kwarg to CoordinateTransform.equals() as well
  • copy or assign the index of each original object in the new aligned objects instead of assigning the aligned index (when copying this may have an impact on performance).

Support checking exact alignment of indexes that use multiple dimensions
in the cases where some of those dimensions are included in alignment
while others are excluded.

Added `exclude_dims` keyword argument to `Index.equals()` (and still
support old signature with future warning).

Also fixed bug: indexes associated with scalar coordinates were ignored
during alignment.

Added tests as well.
Comment on lines 352 to 360
@overload
def equals(self, other: Index) -> bool: ...

@overload
def equals(
self, other: Index, *, exclude_dims: frozenset[Hashable] | None = None
) -> bool: ...

def equals(self, other: Index, **kwargs) -> bool:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added those overloads so Mypy doesn't fail for 3rd-party Xarray indexes. But maybe we want Mypy to fail and thereby encourage maintainers updating their code?

@benbovy benbovy requested a review from dcherian May 7, 2025 12:18
@benbovy benbovy changed the title Alignment of n-dimensional indexes with partial excluded dimensions Alignment of n-dimensional indexes with partially excluded dims May 7, 2025
@benbovy benbovy mentioned this pull request May 7, 2025
6 tasks
exclude_dims = idx_all_dims & self.exclude_dims
if exclude_dims == idx_all_dims:
continue
elif exclude_dims and self.join != "exact":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble deciding if any of the other cases make any sense. Seems OK to remove anything that requires actual reindexing, which leaves "override" and I'm not sure about that. Let's have the error message recommend the user to open an issue if they feel it should work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you are right. I'll try with other join options except "override".

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems OK to me, but is definitely hard to reason about.

@benbovy
Copy link
Member Author

benbovy commented May 12, 2025

Yeah this certainly doesn't help better understand the alignment rules, which are already complex. It looks to me like a pretty robust approach for dealing with "multi-dimensional" indexes and dimensions excluded from alignment, though.

Comment on lines +574 to +585
# TODO: always copy object's index when no re-indexing is required?
# (instead of assigning the aligned index)
# (need performance assessment)
if key in self.keep_original_indexes:
assert self.reindex[key] is False
new_idx = obj_idx.copy(deep=self.copy)
new_idx_vars = new_idx.create_variables(obj_idx_vars)
else:
new_idx = aligned_idx
new_idx_vars = {
k: v.copy(deep=self.copy) for k, v in aligned_idx_vars.items()
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made this to avoid any general performance regression for the pretty narrow-scope feature added here. However, self.keep_original_indexes may not be needed and we could write instead:

               if self.reindex[key] is False:
                    new_idx = obj_idx.copy(deep=self.copy)
                    new_idx_vars = new_idx.create_variables(obj_idx_vars)
                else:
                    new_idx = aligned_idx
                    new_idx_vars = {
                        k: v.copy(deep=self.copy) for k, v in aligned_idx_vars.items()
                    }

(i.e., always copy the index of each original object when no-reindexing is required)

I'm not sure about the performance impact of obj_idx.copy(deep=self.copy). Actually, I'm not sure either about v.copy(deep=self.copy) below (what we do in the main branch) and if we even want that.

This needs further investigation / performance assessment. Maybe in a follow-up PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's fine to address in a followup PR

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @benbovy . It's hard to reason about so I expect we'll need to use this feature a bit to find the edge cases we may have missed.

@dcherian dcherian added the plan to merge Final call for comments label May 13, 2025
@benbovy
Copy link
Member Author

benbovy commented May 14, 2025

@shoyer any thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support alignment with partial overlap between index vs. excluded dimensions
3 participants