Skip to content

Conversation

@RainFung
Copy link
Contributor

@RainFung RainFung commented Dec 14, 2019

This PR proposes to explicitly unsupport Index.duplicated.

@RainFung
Copy link
Contributor Author

@itholic Hello Lee, Can you give some advice?

@softagram-bot
Copy link

Softagram Impact Report for pull/1131 (head commit: 97b6700)

⚠️ Copy paste found

ℹ️ indexes.py: Copy paste fragment inside the same file on lines 762, 1133:

            raise NotImplementedError(
                \"Doesn't support symmetric_difference between Index & MultiIndex for now\")

        sdf_self = self._kdf._s...(truncated 477 chars)

Now that you are on the file, it would be easier to pay back some tech. debt.

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

📄 Full report

Impact Report explained. Give feedback on this report to [email protected]

@itholic
Copy link
Contributor

itholic commented Dec 14, 2019

@RainFung Sure, let me take a look little after while :)

column_index_names=None),
anchor=kdf)

def duplicated(self, keep=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just explicitly unsupport this API as its return type is NumPy arrow and the data size seems expected to be large.
We can just say users can easily work around by DataFrame.deduplicated() API.

>>> kdf.index.to_series().to_frame().duplicated()
0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8    False
9    False
Name: 0, dtype: bool

I think we shouldn't implement Series.duplicated() as well.

Can you refer #744 and fix it accordingly?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait.
Seems like Series.duplicated() returns Series. We can support it.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.duplicated.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I missed it. @RainFung, can you implement this one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can implement it in a separate PR. just revert the updates for Series missing file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, i'll fix it today

@codecov-io
Copy link

codecov-io commented Dec 18, 2019

Codecov Report

Merging #1131 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1131      +/-   ##
==========================================
+ Coverage   95.15%   95.19%   +0.04%     
==========================================
  Files          35       35              
  Lines        7017     7056      +39     
==========================================
+ Hits         6677     6717      +40     
+ Misses        340      339       -1
Impacted Files Coverage Δ
databricks/koalas/missing/indexes.py 100% <ø> (ø) ⬆️
databricks/koalas/missing/common.py 100% <100%> (ø) ⬆️
databricks/koalas/missing/series.py 100% <100%> (ø) ⬆️
databricks/koalas/missing/groupby.py 100% <0%> (ø) ⬆️
databricks/koalas/frame.py 96.86% <0%> (+0.03%) ⬆️
databricks/koalas/groupby.py 91.61% <0%> (+0.22%) ⬆️
databricks/koalas/indexes.py 96.64% <0%> (+0.4%) ⬆️
databricks/koalas/indexing.py 94.57% <0%> (+0.41%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb763ea...ba49710. Read the comment docs.

@HyukjinKwon HyukjinKwon changed the title Implement Index.duplicated Explicitly don't support Index.duplicated Dec 18, 2019
@HyukjinKwon HyukjinKwon merged commit 91ec06a into databricks:master Dec 30, 2019
@ueshin
Copy link
Collaborator

ueshin commented Dec 30, 2019

Did we made Series.duplicated deprecated?
What about #1131 (comment) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants