Implement 'keep' parameter for `drop_duplicates` #1303

deepyaman · 2020-02-21T15:34:45Z

TODO:

Add tests for 'keep' parameter
Add doctests to demonstrate 'keep' parameter

HyukjinKwon · 2020-02-22T00:35:09Z

@deepyaman do you mind rebasing and syncing to the master? There are many conflicts as of 30b3334

codecov-io · 2020-02-22T00:55:11Z

Codecov Report

❗ No coverage uploaded for pull request base (master@49140d5). Click here to learn what that means.
The diff coverage is 91.76%.

@@            Coverage Diff            @@
##             master    #1303   +/-   ##
=========================================
  Coverage          ?   93.75%           
=========================================
  Files             ?       34           
  Lines             ?     7254           
  Branches          ?        0           
=========================================
  Hits              ?     6801           
  Misses            ?      453           
  Partials          ?        0

Impacted Files	Coverage Δ
databricks/koalas/missing/common.py	`100% <ø> (ø)`
databricks/koalas/frame.py	`96.71% <ø> (ø)`
databricks/koalas/numpy_compat.py	`90.9% <100%> (ø)`
databricks/koalas/missing/frame.py	`100% <100%> (ø)`
databricks/koalas/missing/__init__.py	`100% <100%> (ø)`
databricks/koalas/mlflow.py	`94.87% <100%> (ø)`
databricks/koalas/missing/groupby.py	`100% <100%> (ø)`
databricks/koalas/missing/window.py	`100% <100%> (ø)`
databricks/koalas/missing/indexes.py	`100% <100%> (ø)`
databricks/koalas/version.py	`100% <100%> (ø)`
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 49140d5...61091cb. Read the comment docs.

deepyaman · 2020-02-23T20:04:49Z

I noticed another issue with Series while trying to implement this. 1cb4ba0 changed data_scols to column_scols, but that never got reflected in

koalas/databricks/koalas/frame.py

Line 2865 in 1cb4ba0

+ self._internal.data_scols)

I don't believe the column == index_column ever got activated, so it didn't matter. This PR also attempts to fix this.

deepyaman · 2020-02-24T22:56:44Z

@deepyaman do you mind rebasing and syncing to the master? There are many conflicts as of 30b3334

@HyukjinKwon Synced, tests added, and ready to go!

ueshin

Otherwise, LGTM.

databricks/koalas/frame.py

databricks/koalas/series.py

databricks/koalas/tests/test_dataframe.py

databricks/koalas/tests/test_series.py

databricks/koalas/tests/test_dataframe.py

deepyaman · 2020-02-25T21:58:06Z

Adding additional doctests decreased code coverage to fail the build. T_T

ueshin · 2020-02-25T23:05:09Z

databricks/koalas/tests/test_series.py

+
+        for (msg, pser), keep in product(psers.items(), keeps):
+            with self.subTest(msg, keep=keep):
+                kser = ks.Series(pser)


nit: we prefer to use ks.from_pandas(pser) instread of ks.Series(pser).

ueshin · 2020-02-25T23:11:00Z

Thanks! merging.

Implement 'keep' parameter for drop_duplicates

0f69806

deepyaman mentioned this pull request Feb 21, 2020

Implement 'keep' parameter for drop_duplicates #1302

Closed

Merge branch 'master' into drop_duplicates-keep

49140d5

deepyaman added 2 commits February 22, 2020 17:35

Merge branch 'master' into drop_duplicates-keep

d410e78

Make "__duplicated__" column to not overwrite data

6cd7fee

deepyaman added 4 commits February 24, 2020 05:44

Enable 'keep' parameter for Series.drop_duplicates

717dbb2

Add tests for Series.drop_duplicates (diff 'keep')

8a2b7f6

Refactor Series.drop_duplicates test using subTest

c62f4fc

Add tests for DataFrame.drop_duplicates (subtests)

390c728

deepyaman requested review from HyukjinKwon and ueshin February 24, 2020 22:56

ueshin reviewed Feb 25, 2020

View reviewed changes

databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved

databricks/koalas/tests/test_dataframe.py Outdated Show resolved Hide resolved

deepyaman added 4 commits February 25, 2020 18:43

Remove debugging print statement 🙈

372f0e1

Distinguish tests using subtest params (msg, keep)

c3d6219

Distinguish tests using subtest params (msg, keep)

55a946b

Add 'keep' examples (series/frame drop_duplicates)

61091cb

ueshin approved these changes Feb 25, 2020

View reviewed changes

ueshin merged commit afd3e95 into databricks:master Feb 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement 'keep' parameter for `drop_duplicates` #1303

Implement 'keep' parameter for `drop_duplicates` #1303

Uh oh!

deepyaman commented Feb 21, 2020 •

edited

Loading

Uh oh!

HyukjinKwon commented Feb 22, 2020

Uh oh!

codecov-io commented Feb 22, 2020 •

edited

Loading

Uh oh!

deepyaman commented Feb 23, 2020 •

edited

Loading

Uh oh!

deepyaman commented Feb 24, 2020

Uh oh!

ueshin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deepyaman commented Feb 25, 2020

Uh oh!

ueshin Feb 25, 2020

Uh oh!

ueshin commented Feb 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Implement 'keep' parameter for drop_duplicates #1303

Implement 'keep' parameter for drop_duplicates #1303

Uh oh!

Conversation

deepyaman commented Feb 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Feb 22, 2020

Uh oh!

codecov-io commented Feb 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

deepyaman commented Feb 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deepyaman commented Feb 24, 2020

Uh oh!

ueshin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deepyaman commented Feb 25, 2020

Uh oh!

ueshin Feb 25, 2020

Choose a reason for hiding this comment

Uh oh!

ueshin commented Feb 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Implement 'keep' parameter for `drop_duplicates` #1303

Implement 'keep' parameter for `drop_duplicates` #1303

deepyaman commented Feb 21, 2020 •

edited

Loading

codecov-io commented Feb 22, 2020 •

edited

Loading

deepyaman commented Feb 23, 2020 •

edited

Loading