-
Notifications
You must be signed in to change notification settings - Fork 52
Add specifications for set functions #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks @kgryte. I agree with leaving out Regarding sorting, that's an annoying one. Even if TensorFlow would add a sorting option, it probably couldn't easily default to NumPy et al. may add The sort order is important though, it's not uncommon to use the returned values for plotting for example. And I imagine there's other code as well where it matters. I'd lean towards requiring sorted output, but would be good to get input from @alextp on if this is okay for TensorFlow. In terms of signatures, PyTorch misses |
@kgryte to move this PR forward in the absence of more feedback now, I propose:
That way we can land it, and get input from a broader group later on. |
@rgommers Added keyword, opened issue, and added a TODO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, in it goes then.
This PR
Notes
in1d
,intersect1d
,isin
,setdiff1d
,union1d
,setxor1d
); however, these functions are not widely implemented by other analyzed array libraries ( see here) and, thus, were not included in this initial specification. Should additional set functions be necessary, they can be proposed in follow-up proposals.Questions (and notes)
unique
support for the
axis
keyword, while present in API signatures, is not common (currently, NumPy, MXNet, and Torch support; while CuPy, Dask, JAX, and TensorFlow do not). Furthermore, downstream usage of theaxis
keyword is uncommon (e.g., in the record data, we only see one invocation in whichaxis
is specified and that was by SciPy). Accordingly, the proposed spec does not currently accommodate non-flattened multi-dimensional arrays. If a mult-dimensional array is provided, the proposed spec states that implementations should flatten the array before determining unique values. However, the proposed spec does not preclude theaxis
keyword from being added in a future spec revision.should the unique elements be returned in sorted order (NumPy et al sort by default, while Torch has a
sorted
keyword and TensorFlow does not support sorting, instead choosing to preserve order of occurrence)? Or should this be implementation defined? Or should there be a keyword argument to require the array containing the unique elements to be sorted?An argument for an optional keyword argument is that some implementations may choose an alternative data structure to simultaneously sort while determining unique elements, so pushing sort order userland may be undesirable. Note, however, that if an optional keyword argument indicating whether to sort the output is desired, we'd most likely need to support a direction keyword argument for reasons discussed here.
An additional argument for returning unsorted unique elements is that preserving the order of occurrence is sometimes desirable (e.g., take the first
n
unique elements from an arrayx
).