Skip to content

Conversation

@HyukjinKwon
Copy link
Member

This PR adds DataFrame.spark.apply:

import databricks.koalas as ks
ks.range(10).spark.apply(lambda sdf: sdf.selectExpr("id + 1"))
   (id + 1)
0         1
1         2
2         3
3         4
4         5
5         6
6         7
7         8
8         9
9        10

with self.assertRaisesRegex(
ValueError, "The output of the function.* pyspark.sql.DataFrame.*int"
):
ks.range(10).spark.apply(lambda scol: 1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will move relevant test cases into here in a separate PR, e.g.) print_schema is in test_dataframe.py.

@codecov-commenter
Copy link

codecov-commenter commented May 24, 2020

Codecov Report

Merging #1536 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1536   +/-   ##
=======================================
  Coverage   94.14%   94.14%           
=======================================
  Files          37       37           
  Lines        8487     8493    +6     
=======================================
+ Hits         7990     7996    +6     
  Misses        497      497           
Impacted Files Coverage Δ
databricks/koalas/spark.py 90.09% <100.00%> (+0.62%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26c0501...23f470f. Read the comment docs.

Returns
-------
DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my opinion, how about Koalas DataFrame or ks.DataFrame rather just DataFrame because we're describing this functions as "Applies a function that takes and returns a Spark DataFrame" ??
I think maybe It can be confused whether the return type is Spark DataFrame or Koalas DataFrame.

@itholic
Copy link
Contributor

itholic commented May 24, 2020

LGTM, otherwise.

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ueshin
Copy link
Collaborator

ueshin commented May 25, 2020

Thanks! merging.

@ueshin ueshin merged commit 25117cb into databricks:master May 25, 2020
@HyukjinKwon HyukjinKwon deleted the apply-spark-frame branch September 11, 2020 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants