Skip to content

Conversation

@ueshin
Copy link
Collaborator

@ueshin ueshin commented Apr 8, 2020

This is a follow-up of #1399.

When performing mod/rmod, if the operands are series from different dataframes, we needed three joins.

>>> kser = ks.Series([100, None, -300, None, 500, -700], name="Koalas")
>>> (kser % ks.Series([150] * 6)).to_frame().explain()
== Physical Plan ==
*(9) Project [CASE WHEN isnotnull(__index_level_0__#317L) THEN __index_level_0__#317L ELSE __index_level_0__#228L END AS __index_level_0__#378L, (Koalas#364 % cast(0#229L as double)) AS Koalas#425]
+- SortMergeJoin [__index_level_0__#317L], [__index_level_0__#228L], FullOuter
   :- *(7) Sort [__index_level_0__#317L ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(__index_level_0__#317L, 200)
   :     +- *(6) Project [CASE WHEN isnotnull(__index_level_0__#254L) THEN __index_level_0__#254L ELSE __index_level_0__#228L END AS __index_level_0__#317L, (Koalas#303 + cast(0#229L as double)) AS Koalas#364]
   :        +- SortMergeJoin [__index_level_0__#254L], [__index_level_0__#228L], FullOuter
   :           :- *(4) Sort [__index_level_0__#254L ASC NULLS FIRST], false, 0
   :           :  +- Exchange hashpartitioning(__index_level_0__#254L, 200)
   :           :     +- *(3) Project [CASE WHEN isnotnull(__index_level_0__#0L) THEN __index_level_0__#0L ELSE __index_level_0__#228L END AS __index_level_0__#254L, (Koalas#1 % cast(0#229L as double)) AS Koalas#303]
   :           :        +- SortMergeJoin [__index_level_0__#0L], [__index_level_0__#228L], FullOuter
   :           :           :- *(1) Sort [__index_level_0__#0L ASC NULLS FIRST], false, 0
   :           :           :  +- Exchange hashpartitioning(__index_level_0__#0L, 200)
   :           :           :     +- Scan ExistingRDD[__index_level_0__#0L,Koalas#1]
   :           :           +- *(2) Sort [__index_level_0__#228L ASC NULLS FIRST], false, 0
   :           :              +- Exchange hashpartitioning(__index_level_0__#228L, 200)
   :           :                 +- Scan ExistingRDD[__index_level_0__#228L,0#229L]
   :           +- *(5) Sort [__index_level_0__#228L ASC NULLS FIRST], false, 0
   :              +- ReusedExchange [__index_level_0__#228L, 0#229L], Exchange hashpartitioning(__index_level_0__#228L, 200)
   +- *(8) Sort [__index_level_0__#228L ASC NULLS FIRST], false, 0
      +- ReusedExchange [__index_level_0__#228L, 0#229L], Exchange hashpartitioning(__index_level_0__#228L, 200)

We can reduce the number to only one.

>>> (kser % ks.Series([150] * 6)).to_frame().explain()
== Physical Plan ==
*(3) Project [CASE WHEN isnotnull(__index_level_0__#0L) THEN __index_level_0__#0L ELSE __index_level_0__#98L END AS __index_level_0__#118L, (((Koalas#1 % cast(0#99L as double)) + cast(0#99L as double)) % cast(0#99L as double)) AS Koalas#165]
+- SortMergeJoin [__index_level_0__#0L], [__index_level_0__#98L], FullOuter
   :- *(1) Sort [__index_level_0__#0L ASC NULLS FIRST], false, 0
   :  +- Exchange hashpartitioning(__index_level_0__#0L, 200)
   :     +- Scan ExistingRDD[__index_level_0__#0L,Koalas#1]
   +- *(2) Sort [__index_level_0__#98L ASC NULLS FIRST], false, 0
      +- Exchange hashpartitioning(__index_level_0__#98L, 200)
         +- Scan ExistingRDD[__index_level_0__#98L,0#99L]

@ueshin ueshin requested a review from HyukjinKwon April 8, 2020 22:03
@ueshin
Copy link
Collaborator Author

ueshin commented Apr 8, 2020

cc @itholic

@codecov-io
Copy link

codecov-io commented Apr 8, 2020

Codecov Report

Merging #1409 into master will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1409      +/-   ##
==========================================
- Coverage   95.03%   95.01%   -0.02%     
==========================================
  Files          34       34              
  Lines        7952     7950       -2     
==========================================
- Hits         7557     7554       -3     
- Misses        395      396       +1     
Impacted Files Coverage Δ
databricks/koalas/base.py 97.57% <100.00%> (+0.78%) ⬆️
databricks/koalas/series.py 96.96% <0.00%> (-0.37%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2554970...f3e010b. Read the comment docs.

@itholic
Copy link
Contributor

itholic commented Apr 8, 2020

Thanks, I feel learning a lot from this work.
LGTM. 👍

@ueshin
Copy link
Collaborator Author

ueshin commented Apr 9, 2020

Thanks! I'd merge this now. Please feel free to leave comments.

@ueshin ueshin merged commit b5676f3 into databricks:master Apr 9, 2020
@ueshin ueshin deleted the mod branch April 9, 2020 00:22
@itholic
Copy link
Contributor

itholic commented Apr 9, 2020

Got it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants