-
-
Notifications
You must be signed in to change notification settings - Fork 19.5k
ENH: .equals for Extension Arrays #30652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 16 commits
36c8b88
786963c
6800315
a3e7b7f
860013f
fc3d2c2
3da5726
c5027dd
375664c
b6ad2fb
365362a
aae2f94
8d052ad
9ee034e
38501e6
dccec7f
0b1255f
b8be858
4c7273f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -58,6 +58,7 @@ class ExtensionArray: | |||
| dropna | ||||
| factorize | ||||
| fillna | ||||
| equals | ||||
| isna | ||||
| ravel | ||||
| repeat | ||||
|
|
@@ -84,6 +85,7 @@ class ExtensionArray: | |||
| * _from_factorized | ||||
| * __getitem__ | ||||
| * __len__ | ||||
| * __eq__ | ||||
| * dtype | ||||
| * nbytes | ||||
| * isna | ||||
|
|
@@ -333,6 +335,24 @@ def __iter__(self): | |||
| for i in range(len(self)): | ||||
| yield self[i] | ||||
|
|
||||
| def __eq__(self, other: Any) -> ArrayLike: # type: ignore[override] # NOQA | ||||
|
||||
| def sort_values( # type: ignore[override] # NOQA # issue 27237 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have some guidance here that if other is a Series then you should return NotImplemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use ops.common.unpack_zerodim_and_defer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use ops.common.unpack_zerodim_and_defer
This method is to be implemented by EA authors, so those can't use that helper (unless we expose somewhere a public version of this).
(we could of course use that for our own EAs, but this PR is not changing any existing __eq__ implementation at the moment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have some guidance here that if other is a Series then you should return NotImplemented.
Added a comment about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should verify that this is the behavior we want. Namely
- other array-likes are not equivalent, even if they are all equal.
- subclasses are not equivalent, even if they are all equal.
The first seems fine. Not sure about the second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to open an issue about this after this PR (and keep it strict here), because this is right now a bit inconsistent within pandas, and might require a more general discussion / clean-up (eg Series.equals is more strict (requires same dtype) than Index.equals ...)
But we can certainly also have the discussion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to be strict for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it even more strict (same dtype, not just same class), and added a test for that.
Will open an issue for the general discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened an issue for this at #33940
In the end, it seems mainly to come to whether the dtype should exactly be equal or not.
Since for EAs, the dtype is right now tied to the array class, using equal dtype for now also implies the same class (no additional check to allow sublcasses).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the .item() necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the .item() necessary?
It's to have a python bool, instead of a numpy bool, as result.
I added a comment to the tests to make it explicit this is the reason we are asserting with is True/False (and later on we don't inadvertedly "clean" that up)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for clarifying. i was recently reminded that np.array(np.timedelta64(1234, "ns")).item() gives an int instead of timedelta64, so im now cautious around .item()
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -1861,6 +1861,11 @@ def where( | |||||||||
|
|
||||||||||
| return [self.make_block_same_class(result, placement=self.mgr_locs)] | ||||||||||
|
|
||||||||||
| def equals(self, other) -> bool: | ||||||||||
| if self.dtype != other.dtype or self.shape != other.shape: | ||||||||||
|
||||||||||
| in corresponding locations. False otherwise. It is assumed that left and | |
| right are NumPy arrays of the same dtype. The behavior of this function | |
| (particularly with respect to NaNs) is not defined if the dtypes are | |
| different. |
But so, for EA.equals that is not needed, so will remove that check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add the doc reference here