Skip to content

Conversation

@deepyaman
Copy link
Contributor

@deepyaman deepyaman commented Feb 21, 2020

Close #1302

TODO:

  • Add tests for 'keep' parameter
  • Add doctests to demonstrate 'keep' parameter

@HyukjinKwon
Copy link
Member

@deepyaman do you mind rebasing and syncing to the master? There are many conflicts as of 30b3334

@codecov-io
Copy link

codecov-io commented Feb 22, 2020

Codecov Report

❗ No coverage uploaded for pull request base (master@49140d5). Click here to learn what that means.
The diff coverage is 91.76%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1303   +/-   ##
=========================================
  Coverage          ?   93.75%           
=========================================
  Files             ?       34           
  Lines             ?     7254           
  Branches          ?        0           
=========================================
  Hits              ?     6801           
  Misses            ?      453           
  Partials          ?        0
Impacted Files Coverage Δ
databricks/koalas/missing/common.py 100% <ø> (ø)
databricks/koalas/frame.py 96.71% <ø> (ø)
databricks/koalas/numpy_compat.py 90.9% <100%> (ø)
databricks/koalas/missing/frame.py 100% <100%> (ø)
databricks/koalas/missing/__init__.py 100% <100%> (ø)
databricks/koalas/mlflow.py 94.87% <100%> (ø)
databricks/koalas/missing/groupby.py 100% <100%> (ø)
databricks/koalas/missing/window.py 100% <100%> (ø)
databricks/koalas/missing/indexes.py 100% <100%> (ø)
databricks/koalas/version.py 100% <100%> (ø)
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 49140d5...61091cb. Read the comment docs.

@deepyaman
Copy link
Contributor Author

deepyaman commented Feb 23, 2020

I noticed another issue with Series while trying to implement this. 1cb4ba0 changed data_scols to column_scols, but that never got reflected in

+ self._internal.data_scols)
I don't believe the column == index_column ever got activated, so it didn't matter. This PR also attempts to fix this.

@deepyaman
Copy link
Contributor Author

@deepyaman do you mind rebasing and syncing to the master? There are many conflicts as of 30b3334

@HyukjinKwon Synced, tests added, and ready to go!

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, LGTM.

@deepyaman
Copy link
Contributor Author

Adding additional doctests decreased code coverage to fail the build. T_T


for (msg, pser), keep in product(psers.items(), keeps):
with self.subTest(msg, keep=keep):
kser = ks.Series(pser)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we prefer to use ks.from_pandas(pser) instread of ks.Series(pser).

@ueshin
Copy link
Collaborator

ueshin commented Feb 25, 2020

Thanks! merging.

@ueshin ueshin merged commit afd3e95 into databricks:master Feb 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement 'keep' parameter for drop_duplicates

4 participants