Add DataFrame.spark.apply #1536

HyukjinKwon · 2020-05-24T07:59:22Z

This PR adds DataFrame.spark.apply:

import databricks.koalas as ks
ks.range(10).spark.apply(lambda sdf: sdf.selectExpr("id + 1"))

   (id + 1)
0         1
1         2
2         3
3         4
4         5
5         6
6         7
7         8
8         9
9        10

HyukjinKwon · 2020-05-24T08:03:36Z

databricks/koalas/tests/test_frame_spark.py

+        with self.assertRaisesRegex(
+            ValueError, "The output of the function.* pyspark.sql.DataFrame.*int"
+        ):
+            ks.range(10).spark.apply(lambda scol: 1)


I will move relevant test cases into here in a separate PR, e.g.) print_schema is in test_dataframe.py.

codecov-commenter · 2020-05-24T10:33:54Z

Codecov Report

Merging #1536 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1536   +/-   ##
=======================================
  Coverage   94.14%   94.14%           
=======================================
  Files          37       37           
  Lines        8487     8493    +6     
=======================================
+ Hits         7990     7996    +6     
  Misses        497      497

Impacted Files	Coverage Δ
databricks/koalas/spark.py	`90.09% <100.00%> (+0.62%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26c0501...23f470f. Read the comment docs.

itholic · 2020-05-24T22:32:49Z

databricks/koalas/spark.py

+
+        Returns
+        -------
+        DataFrame


Just my opinion, how about Koalas DataFrame or ks.DataFrame rather just DataFrame because we're describing this functions as "Applies a function that takes and returns a Spark DataFrame" ??
I think maybe It can be confused whether the return type is Spark DataFrame or Koalas DataFrame.

itholic · 2020-05-24T22:34:02Z

LGTM, otherwise.

ueshin

LGTM.

ueshin · 2020-05-25T20:57:25Z

Thanks! merging.

HyukjinKwon commented May 24, 2020

View reviewed changes

HyukjinKwon mentioned this pull request May 24, 2020

Add Series.spark.apply #1535

Merged

Add DataFrame.spark.apply

3e2ee2c

HyukjinKwon force-pushed the apply-spark-frame branch from 34de62e to 3e2ee2c Compare May 24, 2020 08:05

Sort for doctests

23f470f

itholic reviewed May 24, 2020

View reviewed changes

ueshin approved these changes May 25, 2020

View reviewed changes

ueshin merged commit 25117cb into databricks:master May 25, 2020

HyukjinKwon deleted the apply-spark-frame branch September 11, 2020 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add DataFrame.spark.apply #1536

Add DataFrame.spark.apply #1536

Uh oh!

HyukjinKwon commented May 24, 2020

Uh oh!

HyukjinKwon May 24, 2020

Uh oh!

codecov-commenter commented May 24, 2020 •

edited

Loading

Uh oh!

itholic May 24, 2020

Uh oh!

itholic commented May 24, 2020

Uh oh!

ueshin left a comment

Uh oh!

ueshin commented May 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add DataFrame.spark.apply #1536

Add DataFrame.spark.apply #1536

Uh oh!

Conversation

HyukjinKwon commented May 24, 2020

Uh oh!

HyukjinKwon May 24, 2020

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented May 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

itholic May 24, 2020

Choose a reason for hiding this comment

Uh oh!

itholic commented May 24, 2020

Uh oh!

ueshin left a comment

Choose a reason for hiding this comment

Uh oh!

ueshin commented May 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented May 24, 2020 •

edited

Loading