Skip to content

Support numpy ufuncs for ExtensionArrays #22798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Sep 21, 2018 · 5 comments
Closed

Support numpy ufuncs for ExtensionArrays #22798

jorisvandenbossche opened this issue Sep 21, 2018 · 5 comments
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Milestone

Comments

@jorisvandenbossche
Copy link
Member

Currently calling numpy ufuncs such as np.exp on a Series[EA] or EA does not work yet:

In [44]: s = pd.Series([1, 2, 3, 4], dtype='Int64')

In [45]: np.exp(s)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-45-fb7258693ae9> in <module>()
----> 1 np.exp(s)

~/scipy/pandas/pandas/core/series.py in __array_prepare__(self, result, context)
    671                                 obj=type(obj).__name__,
    672                                 dtype=getattr(obj, 'dtype', None),
--> 673                                 op=context[0].__name__))
    674         return result
    675 

TypeError: Series with dtype Int64 cannot perform the numpy op exp

In [46]: np.exp(s.values)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-46-69f0b0471ea8> in <module>()
----> 1 np.exp(s.values)

AttributeError: 'int' object has no attribute 'exp'

In [47]: np.exp(s.astype(int))   # but works for numpy dtyped series
Out[47]: 
0     2.718282
1     7.389056
2    20.085537
3    54.598150
dtype: float64

I think it would be nice to have this working, and without looking in detail into it, I would assume the best way to go is to actually support the __array_ufunc__ protocol on ExtensionArrays itself and to ensure Series then properly uses that?

@jorisvandenbossche jorisvandenbossche added Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. labels Sep 21, 2018
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Sep 21, 2018
@TomAugspurger TomAugspurger modified the milestones: Contributions Welcome, 0.24.0 Oct 1, 2018
@TomAugspurger
Copy link
Contributor

I'm bumping into this for PeriodArray. Previously np.add(Series[period], 2) worked since the values were an ndarray of period objects. Now that the values are a real PeriodArray, we raise an exception before trying the op.

I'll do this as a separate PR from PeriodArray.

@TomAugspurger
Copy link
Contributor

This seems difficult for older numpys without __array_ufunc__. There are a few issues:

  1. Series.__array__ is called before we have a chance to do anything. This must return an ndarray, which means we have to convert the Series[EA] to an ndarray, which may be wasteful / expensive.
  2. __array_prepare__ has to return an ndarray. Ideally we would kind of "break out of" the ufunc on the series, and have the ufunc called on the EA instead, and wrap up that result.

At the moment, I think working around these would be too difficult. Right now, I'll see if we can support numpy with __array_ufunc__ and raise with a nice error message otherwise.

@jorisvandenbossche
Copy link
Member Author

BTW, I also have some branch adding the basics of this to ExtensionArrary / IntegerArray. And also stumbled on the current implementation with __array__ / __array_prepare__.

So I would maybe also start with only supporting it for newer numpy.

Question: is np.add(Series[period], 2) used somewhere in our code, or in a test?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 1, 2018 via email

@TomAugspurger
Copy link
Contributor

This was fixed at the pandas level in #23293. 3rd party arrays just need to implement __array_ufunc__ and pandas will pass the array through correctly.

@TomAugspurger TomAugspurger modified the milestones: 1.0, 0.25.0 Jan 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

3 participants