feat: add `isin` to the specification #959

kgryte · 2025-06-12T10:03:23Z

This PR:

resolves RFC: add isin for elementwise set inclusion test #854 by adding isin to the specification.
of the keyword arguments determined according to array comparison data, this PR chooses to support only the invert kwarg. The assume_unique kwarg was not included for the following reasons:
1. not all array libraries support this kwarg (e.g., ndonnx and CuPy). CuPy lists the kwarg in its documentation but states that this kwarg is ignored.
2. when doing a quick search through sklearn, I was only able to find one usage of assume_unique when using isin and that was when searching lists of already known unique values.
3. assume_unique is something of a performance optimization/implementation detail which we have generally attempted to avoid when standardizing APIs.
does not place restrictions on the shape of x2. While some libraries may choose to flatten a multi-dimensional x2, that is something of an implementation detail and not strictly necessary. For example, an implementation could defer to an "includes" kernel which performs nested loop iteration without needing to perform explicit reshapes/copies.
adds support for scalar arguments for either x1 or x2. This follows recent general practice in standardized APIs, with the restriction that at least one of x1 or x2 must be an array.
specifies that value equality should be used, but not must be used. This follows other set APIs (e.g., unique*). As a consequence of value equality, NaN values can never test as True and there is no distinction between signed zeros.
allows both x1 and x2 to be of any data type. However, if x1 and x2 have no promotable data type, behavior is left unspecified and thus implementation-defined.

Questions

Would we be okay with requiring that value equality must be used? Is there a scenario where we want to allow libraries some wiggle room, such as with NaN and signed zero comparison?
Are we okay with leaving out assume_unique?
Are we okay with not mandating reshape behavior if x2 is multi-dimensional?

rgommers

Thanks @kgryte. Looks pretty good to me. I agree with the design choices in the PR description.

Would we be okay with requiring that value equality must be used? Is there a scenario where we want to allow libraries some wiggle room, such as with NaN and signed zero comparison?

I am not sure wiggle room is needed here. This function has more to do with equal than with unique I think. I just checked NumPy, PyTorch, JAX and CuPy - all seem to be using value equality for nan.

Are we okay with leaving out assume_unique?

Yes.

Are we okay with not mandating reshape behavior if x2 is multi-dimensional?

I think that that part of the np.isin docstring is confusing. Reshaping is meaningless, the only point of that is trying to express that the comparisons are element-wise. It'd be better to have a simple double for-loop with pseudo-code. There is no broadcasting either, any shapes should work and the output has the same shape as x1.

rgommers · 2025-06-12T13:08:32Z

src/array_api_stubs/_draft/set_functions.py

+    -   Testing whether an element in ``x1`` corresponds to an element in ``x2`` **should** be determined based on value equality (see :func:`~array_api.equal`). For input arrays having floating-point data types, value-based equality implies the following behavior.
+
+        -   As ``nan`` values compare as ``False``, if an element in ``x1`` is ``nan`` and ``invert`` is ``False``, the corresponding element in the returned array **should** be ``False``. Otherwise, if an element in ``x1`` is ``nan`` and ``invert`` is ``True``, the corresponding element in the returned array **should** be ``True``.
+        -   As complex floating-point values having at least one ``nan`` component compare as ``False``, if an element in ``x1`` is a complex floating-point value having one or more ``nan`` components and ``invert`` is ``False``, the corresponding element in the returned array **should** be ``False``. Otherwise, if an element in ``x1`` is a complex floating-point value having one or more ``nan`` components and ``invert`` is ``True``, the corresponding element in the returned array **should** be ``True``.


This is a bit verbose, I don't think it's necessary to give explicit examples with invert here. In 100% of cases, invert=True applies logical_not to the output array.

rgommers · 2025-06-12T13:09:02Z

src/array_api_stubs/_draft/set_functions.py

+        -   As ``-0`` and ``+0`` compare as ``True``, if an element in ``x1`` is ``±0`` and ``x2`` contains at least one element which is ``±0``
+
+            -   if ``invert`` is ``False``, the corresponding element in the returned array **should** be ``True``.
+            -   if ``invert`` is ``True``, the corresponding element in the returned array **should** be ``False``.


Same comment here about not duplicating with invert=True

ev-br · 2025-06-12T16:27:01Z

src/array_api_stubs/_draft/set_functions.py

+    Parameters
+    ----------
+    x1: Union[array, int, float, complex, bool]
+        first input array. **May** have any data type.


Just to double-check, are we happy with e.g. torch not allowing complex values here:

In [16]: torch.isin(1j, torch.arange(3, dtype=torch.float64)) --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[16], line 1 ----> 1 torch.isin(1j, torch.arange(3, dtype=torch.float64)) RuntimeError: Unsupported input type encountered for isin(): ComplexDouble

ev-br · 2025-06-12T16:28:33Z

src/array_api_stubs/_draft/set_functions.py

+    x2: Union[array, int, float, complex, bool]
+        second input array. **May** have any data type.
+    invert: bool
+        boolean indicating whether to invert the test criterion. If ``True``, the function **must** test whether each element in ``x1`` is *not* in ``x2``. If ``False``, the function **must** test whether each element in ``x1`` is in ``x2``. Default: ``False``.


So if isin(x1, x2, invert=True) is exactly equivalent to logical_not(isin(x1, x2)), we could drop the argument completely.

Yeah, I think that may be true... which in fact might be a bit awkward, because I am not sure if NaN logic adds up nicely or not.

As long as nans are never isin via equality comparison, then it seems to be unambiguous (if strange at a first sight)

In [23]: np.isin(np.nan, [np.nan], invert=True) Out[23]: array(True)

I suppose the weird thing is whether np.nan not in [3.] since np.nan != 3. so how does invert=True work? Like np.nan not in [3.] or not?

In [24]: np.isin(np.nan, [3], invert=True) Out[24]: array(True) In [25]: np.isin(np.nan, [3]) Out[25]: array(False)

Yeah, sorry, mind-slip. Somehow I sometimes think just inverting can lead to weird things with NaNs, but that only works with the other comparisons not == and !=.

Yeah, here it only works because of equality comparison IIUC. Otherwise you're completely right, nans throw off logical inversion

feat: add isin to the specification

7c09df3

Closes: data-apis#854

kgryte added this to the v2025 milestone Jun 12, 2025

kgryte added the API extension Adds new functions or objects to the API. label Jun 12, 2025

kgryte added 2 commits June 12, 2025 03:03

docs: fix typo

7259ac7

fix: import missing type

428be60

kgryte mentioned this pull request Jun 12, 2025

RFC: add isin for elementwise set inclusion test #854

Open

rgommers reviewed Jun 12, 2025

View reviewed changes

ev-br reviewed Jun 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add `isin` to the specification #959

feat: add `isin` to the specification #959

Uh oh!

kgryte commented Jun 12, 2025

Uh oh!

rgommers left a comment

Uh oh!

rgommers Jun 12, 2025

Uh oh!

rgommers Jun 12, 2025

Uh oh!

ev-br Jun 12, 2025

Uh oh!

ev-br Jun 12, 2025

Uh oh!

seberg Jun 12, 2025

Uh oh!

ev-br Jun 12, 2025

Uh oh!

seberg Jun 12, 2025

Uh oh!

ev-br Jun 12, 2025

Uh oh!

seberg Jun 12, 2025

Uh oh!

ev-br Jun 12, 2025

Uh oh!

Uh oh!

feat: add isin to the specification #959

Are you sure you want to change the base?

feat: add isin to the specification #959

Uh oh!

Conversation

kgryte commented Jun 12, 2025

Questions

Uh oh!

rgommers left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

feat: add `isin` to the specification #959

feat: add `isin` to the specification #959