-
Notifications
You must be signed in to change notification settings - Fork 367
Add a basic NumPy ufunc compatibility #1096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1096 +/- ##
==========================================
- Coverage 95.2% 95.18% -0.03%
==========================================
Files 34 34
Lines 6889 6913 +24
==========================================
+ Hits 6559 6580 +21
- Misses 330 333 +3
Continue to review full report at Codecov.
|
Softagram Impact Report for pull/1096 (head commit: 10c7845)
|
|
Tests passed |
ueshin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super cool!
LGTM.
| if result is not NotImplemented: | ||
| return result | ||
| else: | ||
| # TODO: support more APIs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can delegate to pandas UDF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's nice suggestion. Let me investigate a bit more about this. It will only work when the output is n to n but I'm sure there will be the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if we can delegate to pandas UDF in a general way, we can take time to add more Spark native functions like np.sqrt or np.log.
| name = flipped.get(op_name, "__r{}__".format(op_name)) | ||
| return getattr(self, name, not_implemented)(inputs[0]) | ||
| else: | ||
| return NotImplemented |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will be able to add more functions supported in Spark natively.
|
Thanks! I'd merge this for now as a basic of NumPy ufunc compatibility. |
This PR completes NumPy's ufunc support (followup of #1096). See also https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#standard-array-subclasses E.g.: ```python >>> import databricks.koalas as ks >>> import numpy as np >>> kdf = ks.range(10) >>> kser = np.sqrt(kdf.id) >>> type(kser) <class 'databricks.koalas.series.Series'> >>> kser 0 0.000000 1 1.000000 2 1.414214 3 1.732051 4 2.000000 5 2.236068 6 2.449490 7 2.645751 8 2.828427 9 3.000000 ```
This PR completes NumPy's ufunc support (followup of databricks/koalas#1096). See also https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#standard-array-subclasses E.g.: ```python >>> import databricks.koalas as ks >>> import numpy as np >>> kdf = ks.range(10) >>> kser = np.sqrt(kdf.id) >>> type(kser) <class 'databricks.koalas.series.Series'> >>> kser 0 0.000000 1 1.000000 2 1.414214 3 1.732051 4 2.000000 5 2.236068 6 2.449490 7 2.645751 8 2.828427 9 3.000000 ```

This PR implements
__array_ufunc__(see https://docs.scipy.org/doc/numpy/reference/ufuncs.html#output-type-determination) to allow some of basic ufunc can run against Koalas Series and Index (some dunder APIs).