-
Notifications
You must be signed in to change notification settings - Fork 367
Add has_duplicates property for Index and MultiIndex #946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #946 +/- ##
==========================================
+ Coverage 94.49% 94.55% +0.05%
==========================================
Files 34 34
Lines 6436 6486 +50
==========================================
+ Hits 6082 6133 +51
+ Misses 354 353 -1
Continue to review full report at Codecov.
|
databricks/koalas/indexes.py
Outdated
| idx_column = self._kdf._sdf.select(self._scol) | ||
| deduplicate_idx_column = idx_column.drop_duplicates() | ||
|
|
||
| if idx_column.count() == deduplicate_idx_column.count(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This runs Spark jobs twice. Can we use F.count and F.countDistinct instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh! Nice! thanks!!!
databricks/koalas/indexes.py
Outdated
| col = scol.columns[0] | ||
|
|
||
| count = scol.agg(F.count(col).alias('count')).collect() | ||
| dedup_count = scol.agg(F.countDistinct(col).alias('count')).collect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charlesdong1991 can we do it in one pass?
return df.select(F.count("id") != F.countDistinct("id")).first()[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is much simpler! thanks! @HyukjinKwon
Softagram Impact Report for pull/946 (head commit: 6abbe02)⭐ Change Overview
📄 Full report
Impact Report explained. Give feedback on this report to [email protected] |

No description provided.