Skip to content

Conversation

@ahuang11
Copy link
Contributor

@ahuang11 ahuang11 commented Mar 30, 2021

ahuang11 and others added 2 commits March 29, 2021 20:08
@max-sixty
Copy link
Collaborator

max-sixty commented Mar 31, 2021

Thank you for the PR @ahuang11 !

What are people's thoughts?

We don't have precedent for methods that return a numpy array. But it's unclear what xarray object this would return.

@rhkleijn
Copy link
Contributor

rhkleijn commented Mar 31, 2021

We don't have precedent for methods that return a numpy array. But it's unclear what xarray object this would return.

There is a precedent in xarray (it might be the only one):

In [2]: xr.DataArray([1,2,3,4,5]).searchsorted([-10, 10, 2, 3])
Out[2]: array([0, 5, 1, 2])

@shoyer
Copy link
Member

shoyer commented Apr 1, 2021

We don't have precedent for methods that return a numpy array. But it's unclear what xarray object this would return.

There is a precedent in xarray (it might be the only one):

In [2]: xr.DataArray([1,2,3,4,5]).searchsorted([-10, 10, 2, 3])
Out[2]: array([0, 5, 1, 2])

I think searchsorted was probably a mistake from the very early days of Xarray.

I would lean against adding unique. These functions are very short to write (and thus easy to reproduce in user code), so it's not clear that the cost/benefit is worth it.

@ahuang11
Copy link
Contributor Author

ahuang11 commented Apr 5, 2021

What if we added coordinates/dims to it and it returns a stacked dimension if multiple dims?

def unique(da):
    da_stack = da.stack({'tmp_dim': da.dims})
    _, index = np.unique(da_stack.values, return_index=True)
    return da_stack.isel({'tmp_dim': index})

da = xr.DataArray([[[0, 1, 1], [2, 3, 4], [4, 5, 6]], [[7, 8, 9], [10, 11, 12], [13, 14, 15]]],
                  coords={'lat': [0, 1, 2], 'lon': [4, 5, 6], 'time': [7, 8]}, dims=['time', 'lat', 'lon'])
unique(da)  # would be da.unique()

Then users can use da.unique().unstack() if they like.
image

@shoyer
Copy link
Member

shoyer commented Apr 5, 2021

Let's sort out drop_duplicates (#5089) first. I imagine unique() could be a special case of the exact same logic.

@ahuang11 ahuang11 closed this Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add "unique()" method, mimicking pandas

4 participants