Implement Series.where #922

itholic · 2019-10-13T00:21:03Z

Like pandas Series.where (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.where.html)

implemented function where for series.

>>> s1 = ks.Series([0, 1, 2, 3, 4])
>>> s2 = ks.Series([100, 200, 300, 400, 500])
>>> s1.where(s1 > 0)
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
Name: 0, dtype: float64


>>> s1.where(s1 > 1, 10)
0    10
1    10
2     2
3     3
4     4
Name: 0, dtype: int64

>>> s1.where(s1 > 1, s1 + 50)
0    50
1    51
2     2
3     3
4     4
Name: 0, dtype: int64


>>> s1.where(s1 > 1, s2)
0    100
1    200
2      2
3      3
4      4
Name: 0, dtype: int64

codecov-io · 2019-10-13T00:50:39Z

Codecov Report

Merging #922 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #922      +/-   ##
==========================================
+ Coverage   94.52%   94.53%   +<.01%     
==========================================
  Files          34       34              
  Lines        6465     6476      +11     
==========================================
+ Hits         6111     6122      +11     
  Misses        354      354

Impacted Files	Coverage Δ
databricks/koalas/missing/series.py	`100% <ø> (ø)`	⬆️
databricks/koalas/series.py	`96.15% <100%> (+0.05%)`	⬆️
databricks/koalas/internal.py	`96.38% <0%> (ø)`	⬆️
databricks/koalas/namespace.py	`86.83% <0%> (ø)`	⬆️
databricks/koalas/frame.py	`96.02% <0%> (ø)`	⬆️
databricks/koalas/indexes.py	`96.44% <0%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8dcb64...b620849. Read the comment docs.

ueshin

Shall we add more tests in test_series to check various patterns? e.g.,

>>> s1 = pd.Series([0, 1, 2, 3, 4])
>>> s2 = pd.Series([100, 200, 300, 400, 500])

>>> s1.where(s2 > 100)
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64

and negative cases?

databricks/koalas/series.py

HyukjinKwon · 2019-10-21T03:59:08Z

databricks/koalas/series.py


        return self._with_new_scol(current)

+    def where(self, cond, other=np.nan):


@itholic seems like pandas shares the same implementation internally. After this PR is merged, can you move this into _Frame class and implement DataFrame.where as well?

okay, i'm going to work right after this PR is merged

HyukjinKwon · 2019-10-21T03:59:27Z

Seems fine to me otherwise.

softagram-bot · 2019-10-25T02:31:42Z

Softagram Impact Report for pull/922 (head commit: `b620849`)

⭐ Change Overview

(Open in Softagram Desktop for full details)

📄 Full report

Permalink: Full report for pull/922

Impact Report explained. Give feedback on this report to [email protected]

HyukjinKwon · 2019-12-03T04:10:03Z

databricks/koalas/series.py

+        # |                4|  4|            true|              500|
+        # +-----------------+---+----------------+-----------------+
+        data_col_name = self._internal.column_name_for(self._internal.column_index[0])
+        index_column = self._internal.index_columns[0]


@itholic, I think this doesn't support multi-level index cases. Can you fix this please?

index_columns can be multiple and we cannot just use the first one only.

HyukjinKwon · 2019-12-03T04:12:19Z

databricks/koalas/tests/test_series.py

+        set_option("compute.ops_on_diff_frames", True)
+
+    @classmethod
+    def tearDownClass(cls):


@itholic disable this. compute.ops_on_diff_frames is disabled by default because it costs a lot. We should move the test cases into OpsOnDiffFramesEnabledTest

HyukjinKwon · 2019-12-03T04:13:16Z

databricks/koalas/tests/test_series.py

                       kser.drop_duplicates().sort_values())

+    def test_where(self):
+        pser1 = pd.Series([0, 1, 2, 3, 4], name=0)


Can you add a test when compute.ops_on_diff_frames is off? I think we can still use a scalar values for other such as int.

Implement Series.where

00910c7

Enable other as Series

39aec52

ueshin reviewed Oct 18, 2019

View reviewed changes

databricks/koalas/series.py Outdated Show resolved Hide resolved

databricks/koalas/series.py Outdated Show resolved Hide resolved

databricks/koalas/series.py Outdated Show resolved Hide resolved

HyukjinKwon reviewed Oct 21, 2019

View reviewed changes

itholic added 6 commits October 23, 2019 15:24

tests/test_series.py

ab6f747

change logic to use temp col & add some tests

373ccda

Resolve conflicts

658683b

Resolve conflicts

1acf285

Fix missing

bed954e

Remove xs from missing

b620849

HyukjinKwon merged commit 709b928 into databricks:master Oct 28, 2019

HyukjinKwon approved these changes Oct 28, 2019

View reviewed changes

itholic deleted the s_where branch November 6, 2019 05:32

HyukjinKwon reviewed Dec 3, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Series.where #922

Implement Series.where #922

Uh oh!

itholic commented Oct 13, 2019 •

edited

Loading

Uh oh!

codecov-io commented Oct 13, 2019 •

edited

Loading

Uh oh!

ueshin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HyukjinKwon Oct 21, 2019

Uh oh!

itholic Oct 22, 2019

Uh oh!

HyukjinKwon commented Oct 21, 2019

Uh oh!

softagram-bot commented Oct 25, 2019

Uh oh!

HyukjinKwon Dec 3, 2019

Uh oh!

HyukjinKwon Dec 3, 2019

Uh oh!

HyukjinKwon Dec 3, 2019

Uh oh!

HyukjinKwon Dec 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		return self._with_new_scol(current)

		def where(self, cond, other=np.nan):

Implement Series.where #922

Implement Series.where #922

Uh oh!

Conversation

itholic commented Oct 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Oct 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ueshin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HyukjinKwon Oct 21, 2019

Choose a reason for hiding this comment

Uh oh!

itholic Oct 22, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Oct 21, 2019

Uh oh!

softagram-bot commented Oct 25, 2019

Softagram Impact Report for pull/922 (head commit: b620849)

⭐ Change Overview

📄 Full report

Uh oh!

HyukjinKwon Dec 3, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 3, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 3, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 3, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

itholic commented Oct 13, 2019 •

edited

Loading

codecov-io commented Oct 13, 2019 •

edited

Loading

Softagram Impact Report for pull/922 (head commit: `b620849`)