-
Notifications
You must be signed in to change notification settings - Fork 53
feat: add isin
to the specification
#959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kgryte. Looks pretty good to me. I agree with the design choices in the PR description.
Would we be okay with requiring that value equality must be used? Is there a scenario where we want to allow libraries some wiggle room, such as with NaN and signed zero comparison?
I am not sure wiggle room is needed here. This function has more to do with equal
than with unique
I think. I just checked NumPy, PyTorch, JAX and CuPy - all seem to be using value equality for nan
.
Are we okay with leaving out
assume_unique
?
Yes.
Are we okay with not mandating reshape behavior if
x2
is multi-dimensional?
I think that that part of the np.isin
docstring is confusing. Reshaping is meaningless, the only point of that is trying to express that the comparisons are element-wise. It'd be better to have a simple double for-loop with pseudo-code. There is no broadcasting either, any shapes should work and the output has the same shape as x1
.
- Testing whether an element in ``x1`` corresponds to an element in ``x2`` **should** be determined based on value equality (see :func:`~array_api.equal`). For input arrays having floating-point data types, value-based equality implies the following behavior. | ||
|
||
- As ``nan`` values compare as ``False``, if an element in ``x1`` is ``nan`` and ``invert`` is ``False``, the corresponding element in the returned array **should** be ``False``. Otherwise, if an element in ``x1`` is ``nan`` and ``invert`` is ``True``, the corresponding element in the returned array **should** be ``True``. | ||
- As complex floating-point values having at least one ``nan`` component compare as ``False``, if an element in ``x1`` is a complex floating-point value having one or more ``nan`` components and ``invert`` is ``False``, the corresponding element in the returned array **should** be ``False``. Otherwise, if an element in ``x1`` is a complex floating-point value having one or more ``nan`` components and ``invert`` is ``True``, the corresponding element in the returned array **should** be ``True``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit verbose, I don't think it's necessary to give explicit examples with invert
here. In 100% of cases, invert=True
applies logical_not
to the output array.
- As ``-0`` and ``+0`` compare as ``True``, if an element in ``x1`` is ``±0`` and ``x2`` contains at least one element which is ``±0`` | ||
|
||
- if ``invert`` is ``False``, the corresponding element in the returned array **should** be ``True``. | ||
- if ``invert`` is ``True``, the corresponding element in the returned array **should** be ``False``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about not duplicating with invert=True
Parameters | ||
---------- | ||
x1: Union[array, int, float, complex, bool] | ||
first input array. **May** have any data type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double-check, are we happy with e.g. torch not allowing complex values here:
In [16]: torch.isin(1j, torch.arange(3, dtype=torch.float64))
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[16], line 1
----> 1 torch.isin(1j, torch.arange(3, dtype=torch.float64))
RuntimeError: Unsupported input type encountered for isin(): ComplexDouble
x2: Union[array, int, float, complex, bool] | ||
second input array. **May** have any data type. | ||
invert: bool | ||
boolean indicating whether to invert the test criterion. If ``True``, the function **must** test whether each element in ``x1`` is *not* in ``x2``. If ``False``, the function **must** test whether each element in ``x1`` is in ``x2``. Default: ``False``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if isin(x1, x2, invert=True)
is exactly equivalent to logical_not(isin(x1, x2))
, we could drop the argument completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that may be true... which in fact might be a bit awkward, because I am not sure if NaN
logic adds up nicely or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as nans are never isin
via equality comparison, then it seems to be unambiguous (if strange at a first sight)
In [23]: np.isin(np.nan, [np.nan], invert=True)
Out[23]: array(True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the weird thing is whether np.nan not in [3.]
since np.nan != 3.
so how does invert=True
work? Like np.nan not in [3.]
or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [24]: np.isin(np.nan, [3], invert=True)
Out[24]: array(True)
In [25]: np.isin(np.nan, [3])
Out[25]: array(False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry, mind-slip. Somehow I sometimes think just inverting can lead to weird things with NaNs, but that only works with the other comparisons not ==
and !=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, here it only works because of equality comparison IIUC. Otherwise you're completely right, nans throw off logical inversion
This PR:
resolves RFC: add
isin
for elementwise set inclusion test #854 by addingisin
to the specification.of the keyword arguments determined according to array comparison data, this PR chooses to support only the
invert
kwarg. Theassume_unique
kwarg was not included for the following reasons:assume_unique
when usingisin
and that was when searching lists of already known unique values.assume_unique
is something of a performance optimization/implementation detail which we have generally attempted to avoid when standardizing APIs.does not place restrictions on the shape of
x2
. While some libraries may choose to flatten a multi-dimensionalx2
, that is something of an implementation detail and not strictly necessary. For example, an implementation could defer to an "includes" kernel which performs nested loop iteration without needing to perform explicit reshapes/copies.adds support for scalar arguments for either
x1
orx2
. This follows recent general practice in standardized APIs, with the restriction that at least one ofx1
orx2
must be an array.specifies that value equality should be used, but not must be used. This follows other set APIs (e.g.,
unique*
). As a consequence of value equality,NaN
values can never test asTrue
and there is no distinction between signed zeros.allows both
x1
andx2
to be of any data type. However, ifx1
andx2
have no promotable data type, behavior is left unspecified and thus implementation-defined.Questions
NaN
and signed zero comparison?assume_unique
?x2
is multi-dimensional?