Skip to content

Conversation

@ueshin
Copy link
Collaborator

@ueshin ueshin commented Apr 23, 2020

This PR is exposing spark_column property representing the Series/Index for users who are familiar with Spark functions to make it easier to work with them.

E.g.:

>>> kdf = ks.DataFrame({'a': [1.0, 1.0, 1.0, 2.0, 2.0, 2.0], 'b': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]})

>>> from pyspark.sql import functions as F
>>> kdf['greatest'] = F.greatest(kdf.a.spark_column, kdf.b.spark_column)
>>> kdf['least'] = F.least(kdf.a.spark_column, kdf.b.spark_column)
>>> kdf
     a    b  greatest  least
0  1.0  1.0       1.0    1.0
1  1.0  2.0       2.0    1.0
2  1.0  3.0       3.0    1.0
3  2.0  4.0       4.0    2.0
4  2.0  5.0       5.0    2.0
5  2.0  6.0       6.0    2.0

@ueshin ueshin requested a review from HyukjinKwon April 23, 2020 00:04
@HyukjinKwon HyukjinKwon merged commit 9d97ffc into databricks:master Apr 23, 2020
@ueshin ueshin deleted the expose_spark_column branch April 23, 2020 02:38
@itholic
Copy link
Contributor

itholic commented Apr 26, 2020

Cool !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants