Add a basic NumPy ufunc compatibility #1096

HyukjinKwon · 2019-12-03T02:29:16Z

This PR implements __array_ufunc__ (see https://docs.scipy.org/doc/numpy/reference/ufuncs.html#output-type-determination) to allow some of basic ufunc can run against Koalas Series and Index (some dunder APIs).

>>> import databricks.koalas as ks
>>> import numpy as np
>>> kdf = ks.range(10)
>>> kdf = np.add(kdf.id, kdf.id)
>>> type(kdf)
<class 'databricks.koalas.series.Series'>
>>> kdf
0     0
1     2
2     4
3     6
4     8
5    10
6    12
7    14
8    16
9    18
Name: id, dtype: int64

codecov-io · 2019-12-03T03:07:28Z

Codecov Report

Merging #1096 into master will decrease coverage by 0.02%.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##           master    #1096      +/-   ##
==========================================
- Coverage    95.2%   95.18%   -0.03%     
==========================================
  Files          34       34              
  Lines        6889     6913      +24     
==========================================
+ Hits         6559     6580      +21     
- Misses        330      333       +3

Impacted Files	Coverage Δ
databricks/koalas/base.py	`94.88% <85.71%> (-1.02%)`	⬇️
databricks/koalas/series.py	`96.44% <0%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a7a640...10c7845. Read the comment docs.

softagram-bot · 2019-12-03T05:54:52Z

Softagram Impact Report for pull/1096 (head commit: `10c7845`)

⚠️ Copy paste found

ℹ️ test_numpy_compat.py: Copy paste fragment on line 24 shared with ../test_dataframe.py:


    @property
    def pdf(self):
        return pd.DataFrame({
            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
            'b': [4, 5, 6, 3, 2, 1, ...(truncated 160 chars)

ℹ️ test_numpy_compat.py: Copy paste fragment on line 24 shared with ../test_dataframe.py, ../test_indexes.py:


    @property
    def pdf(self):
        return pd.DataFrame({
            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
            'b': [4, 5, 6, 3, 2, 1, ...(truncated 160 chars)

ℹ️ test_numpy_compat.py: Copy paste fragment on line 27 shared with ../test_dataframe.py, ../test_indexes.py, ../test_ops_on_diff_frames.py:

        return pd.DataFrame({
            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
            'b': [4, 5, 6, 3, 2, 1, 0, 0, 0],
        }, index=[0, 1, 3, 5, 6, 8, 9, 9, 9])

    @propert...(truncated 20 chars)

ℹ️ test_numpy_compat.py: Copy paste fragment on line 27 shared with ../test_dataframe.py, ../test_indexes.py, ../test_ops_on_diff_frames.py:

        return pd.DataFrame({
            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
            'b': [4, 5, 6, 3, 2, 1, 0, 0, 0],
        }, index=[0, 1, 3, 5, 6, 8, 9, 9, 9])

ℹ️ test_numpy_compat.py: Copy paste fragment on line 24 shared with ../test_dataframe.py, ../test_indexes.py, ../test_indexing.py:


    @property
    def pdf(self):
        return pd.DataFrame({
            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
            'b': [4, 5, 6, 3, 2, 1, ...(truncated 9 chars)

ℹ️ test_numpy_compat.py: Copy paste fragment on line 27 shared with ../test_dataframe.py, ../test_indexes.py, ../test_ops_on_diff_frames.py:

        return pd.DataFrame({
            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
            'b': [4, 5, 6, 3, 2, 1, 0, 0, 0],

ℹ️ test_numpy_compat.py: Copy paste fragment on line 28 shared with ../test_dataframe.py, ../test_indexes.py, ../test_ops_on_diff_frames.py:

            'a': [1, 2, 3, 4, 5, 6, 7, 8, 9],
            'b': [4, 5, 6, 3, 2, 1, 0, 0, 0],
        }, index=[0, 1, 3, 5, 6, 8, 9, 9, 9])

ℹ️ base.py: Copy paste fragment inside the same file on lines 709, 772:

        if axis != 0:
            raise ValueError('axis should be either 0 or \"index\" currently.')

        sdf = self._internal._sdf.select(self._scol)
        col...(truncated 380 chars)

Now that you are on the file, it would be easier to pay back some tech. debt.

⭐ Change Overview

(Open in Softagram Desktop for full details)

📄 Full report

Permalink: Full report for pull/1096

Impact Report explained. Give feedback on this report to [email protected]

HyukjinKwon · 2019-12-03T06:54:14Z

Tests passed

ueshin

super cool!
LGTM.

ueshin · 2019-12-03T19:07:13Z

databricks/koalas/base.py

+        if result is not NotImplemented:
+            return result
+        else:
+            # TODO: support more APIs?


Maybe we can delegate to pandas UDF?

Oh that's nice suggestion. Let me investigate a bit more about this. It will only work when the output is n to n but I'm sure there will be the case.

Yeah, if we can delegate to pandas UDF in a general way, we can take time to add more Spark native functions like np.sqrt or np.log.

ueshin · 2019-12-03T19:17:15Z

databricks/koalas/base.py

+            name = flipped.get(op_name, "__r{}__".format(op_name))
+            return getattr(self, name, not_implemented)(inputs[0])
+    else:
+        return NotImplemented


We will be able to add more functions supported in Spark natively.

ueshin · 2019-12-03T19:18:11Z

Thanks! I'd merge this for now as a basic of NumPy ufunc compatibility.

This PR completes NumPy's ufunc support (followup of #1096). See also https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#standard-array-subclasses E.g.: ```python >>> import databricks.koalas as ks >>> import numpy as np >>> kdf = ks.range(10) >>> kser = np.sqrt(kdf.id) >>> type(kser) <class 'databricks.koalas.series.Series'> >>> kser 0 0.000000 1 1.000000 2 1.414214 3 1.732051 4 2.000000 5 2.236068 6 2.449490 7 2.645751 8 2.828427 9 3.000000 ```

This PR completes NumPy's ufunc support (followup of databricks/koalas#1096). See also https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#standard-array-subclasses E.g.: ```python >>> import databricks.koalas as ks >>> import numpy as np >>> kdf = ks.range(10) >>> kser = np.sqrt(kdf.id) >>> type(kser) <class 'databricks.koalas.series.Series'> >>> kser 0 0.000000 1 1.000000 2 1.414214 3 1.732051 4 2.000000 5 2.236068 6 2.449490 7 2.645751 8 2.828427 9 3.000000 ```

Add a basic NumPy ufunc compatability

1acd6d4

HyukjinKwon requested a review from ueshin December 3, 2019 02:29

Copy maybe_dispatch_ufunc_to_dunder_op

10c7845

ueshin approved these changes Dec 3, 2019

View reviewed changes

ueshin merged commit be8d5b0 into databricks:master Dec 3, 2019

HyukjinKwon mentioned this pull request Dec 4, 2019

Complete NumPy ufunc compatibility in Series #1106

Merged

HyukjinKwon mentioned this pull request Apr 12, 2020

Document that we don't support the compatibility with non-Koalas APIs yet. #1414

Closed

HyukjinKwon deleted the np-compat branch September 11, 2020 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a basic NumPy ufunc compatibility #1096

Add a basic NumPy ufunc compatibility #1096

Uh oh!

HyukjinKwon commented Dec 3, 2019

Uh oh!

codecov-io commented Dec 3, 2019 •

edited

Loading

Uh oh!

softagram-bot commented Dec 3, 2019

Uh oh!

HyukjinKwon commented Dec 3, 2019

Uh oh!

ueshin left a comment

Uh oh!

ueshin Dec 3, 2019

Uh oh!

HyukjinKwon Dec 3, 2019

Uh oh!

ueshin Dec 4, 2019

Uh oh!

ueshin Dec 3, 2019

Uh oh!

ueshin commented Dec 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add a basic NumPy ufunc compatibility #1096

Add a basic NumPy ufunc compatibility #1096

Uh oh!

Conversation

HyukjinKwon commented Dec 3, 2019

Uh oh!

codecov-io commented Dec 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

softagram-bot commented Dec 3, 2019

Softagram Impact Report for pull/1096 (head commit: 10c7845)

⚠️ Copy paste found

⭐ Change Overview

📄 Full report

Uh oh!

HyukjinKwon commented Dec 3, 2019

Uh oh!

ueshin left a comment

Choose a reason for hiding this comment

Uh oh!

ueshin Dec 3, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 3, 2019

Choose a reason for hiding this comment

Uh oh!

ueshin Dec 4, 2019

Choose a reason for hiding this comment

Uh oh!

ueshin Dec 3, 2019

Choose a reason for hiding this comment

Uh oh!

ueshin commented Dec 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-io commented Dec 3, 2019 •

edited

Loading

Softagram Impact Report for pull/1096 (head commit: `10c7845`)