Skip to content

Add unique method #5091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Add unique method #5091

wants to merge 2 commits into from

Conversation

ahuang11
Copy link
Contributor

@ahuang11 ahuang11 commented Mar 30, 2021

ahuang11 and others added 2 commits March 29, 2021 20:08
@max-sixty
Copy link
Collaborator

max-sixty commented Mar 31, 2021

Thank you for the PR @ahuang11 !

What are people's thoughts?

We don't have precedent for methods that return a numpy array. But it's unclear what xarray object this would return.

@rhkleijn
Copy link
Contributor

rhkleijn commented Mar 31, 2021

We don't have precedent for methods that return a numpy array. But it's unclear what xarray object this would return.

There is a precedent in xarray (it might be the only one):

In [2]: xr.DataArray([1,2,3,4,5]).searchsorted([-10, 10, 2, 3])
Out[2]: array([0, 5, 1, 2])

@shoyer
Copy link
Member

shoyer commented Apr 1, 2021

We don't have precedent for methods that return a numpy array. But it's unclear what xarray object this would return.

There is a precedent in xarray (it might be the only one):

In [2]: xr.DataArray([1,2,3,4,5]).searchsorted([-10, 10, 2, 3])
Out[2]: array([0, 5, 1, 2])

I think searchsorted was probably a mistake from the very early days of Xarray.

I would lean against adding unique. These functions are very short to write (and thus easy to reproduce in user code), so it's not clear that the cost/benefit is worth it.

@ahuang11
Copy link
Contributor Author

ahuang11 commented Apr 5, 2021

What if we added coordinates/dims to it and it returns a stacked dimension if multiple dims?

def unique(da):
    da_stack = da.stack({'tmp_dim': da.dims})
    _, index = np.unique(da_stack.values, return_index=True)
    return da_stack.isel({'tmp_dim': index})

da = xr.DataArray([[[0, 1, 1], [2, 3, 4], [4, 5, 6]], [[7, 8, 9], [10, 11, 12], [13, 14, 15]]],
                  coords={'lat': [0, 1, 2], 'lon': [4, 5, 6], 'time': [7, 8]}, dims=['time', 'lat', 'lon'])
unique(da)  # would be da.unique()

Then users can use da.unique().unstack() if they like.
image

@shoyer
Copy link
Member

shoyer commented Apr 5, 2021

Let's sort out drop_duplicates (#5089) first. I imagine unique() could be a special case of the exact same logic.

@ahuang11 ahuang11 closed this Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add "unique()" method, mimicking pandas
4 participants