-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Alignment of n-dimensional indexes with partially excluded dims #10293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Support checking exact alignment of indexes that use multiple dimensions in the cases where some of those dimensions are included in alignment while others are excluded. Added `exclude_dims` keyword argument to `Index.equals()` (and still support old signature with future warning). Also fixed bug: indexes associated with scalar coordinates were ignored during alignment. Added tests as well.
xarray/core/indexes.py
Outdated
@overload | ||
def equals(self, other: Index) -> bool: ... | ||
|
||
@overload | ||
def equals( | ||
self, other: Index, *, exclude_dims: frozenset[Hashable] | None = None | ||
) -> bool: ... | ||
|
||
def equals(self, other: Index, **kwargs) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added those overloads so Mypy doesn't fail for 3rd-party Xarray indexes. But maybe we want Mypy to fail and thereby encourage maintainers updating their code?
xarray/structure/alignment.py
Outdated
exclude_dims = idx_all_dims & self.exclude_dims | ||
if exclude_dims == idx_all_dims: | ||
continue | ||
elif exclude_dims and self.join != "exact": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble deciding if any of the other cases make any sense. Seems OK to remove anything that requires actual reindexing, which leaves "override"
and I'm not sure about that. Let's have the error message recommend the user to open an issue if they feel it should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you are right. I'll try with other join options except "override".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems OK to me, but is definitely hard to reason about.
Co-authored-by: Deepak Cherian <[email protected]>
for more information, see https://pre-commit.ci
Co-authored-by: Deepak Cherian <[email protected]>
Better readability.
Yeah this certainly doesn't help better understand the alignment rules, which are already complex. It looks to me like a pretty robust approach for dealing with "multi-dimensional" indexes and dimensions excluded from alignment, though. |
# TODO: always copy object's index when no re-indexing is required? | ||
# (instead of assigning the aligned index) | ||
# (need performance assessment) | ||
if key in self.keep_original_indexes: | ||
assert self.reindex[key] is False | ||
new_idx = obj_idx.copy(deep=self.copy) | ||
new_idx_vars = new_idx.create_variables(obj_idx_vars) | ||
else: | ||
new_idx = aligned_idx | ||
new_idx_vars = { | ||
k: v.copy(deep=self.copy) for k, v in aligned_idx_vars.items() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this to avoid any general performance regression for the pretty narrow-scope feature added here. However, self.keep_original_indexes
may not be needed and we could write instead:
if self.reindex[key] is False:
new_idx = obj_idx.copy(deep=self.copy)
new_idx_vars = new_idx.create_variables(obj_idx_vars)
else:
new_idx = aligned_idx
new_idx_vars = {
k: v.copy(deep=self.copy) for k, v in aligned_idx_vars.items()
}
(i.e., always copy the index of each original object when no-reindexing is required)
I'm not sure about the performance impact of obj_idx.copy(deep=self.copy)
. Actually, I'm not sure either about v.copy(deep=self.copy)
below (what we do in the main branch) and if we even want that.
This needs further investigation / performance assessment. Maybe in a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's fine to address in a followup PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @benbovy . It's hard to reason about so I expect we'll need to use this feature a bit to find the edge cases we may have missed.
Test a bit more than join="exact".
@shoyer any thoughts on this? |
whats-new.rst
Support checking exact alignment of indexes that use multiple dimensions in the cases where some of those dimensions are included in alignment while others are excluded.
Add the
exclude_dims
keyword argument toIndex.equals()
(still support old signature withFutureWarning
).Also fixed a bug: indexes associated with scalar coordinates were ignored during alignment.
TODO:
exclude_dims
kwarg toCoordinateTransform.equals()
as well