Add has_duplicates property for Index and MultiIndex #946

charlesdong1991 · 2019-10-23T23:35:27Z

No description provided.

codecov-io · 2019-10-24T03:23:53Z

Codecov Report

Merging #946 into master will increase coverage by 0.05%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #946      +/-   ##
==========================================
+ Coverage   94.49%   94.55%   +0.05%     
==========================================
  Files          34       34              
  Lines        6436     6486      +50     
==========================================
+ Hits         6082     6133      +51     
+ Misses        354      353       -1

Impacted Files	Coverage Δ
databricks/koalas/missing/indexes.py	`100% <ø> (ø)`	⬆️
databricks/koalas/indexes.py	`96.57% <100%> (+0.14%)`	⬆️
databricks/koalas/internal.py	`96.38% <0%> (ø)`	⬆️
databricks/koalas/missing/frame.py	`100% <0%> (ø)`	⬆️
databricks/koalas/namespace.py	`86.83% <0%> (ø)`	⬆️
databricks/koalas/plot.py	`94.28% <0%> (ø)`	⬆️
databricks/koalas/missing/series.py	`100% <0%> (ø)`	⬆️
databricks/koalas/frame.py	`96.03% <0%> (ø)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0df6453...6abbe02. Read the comment docs.

ueshin · 2019-10-24T08:58:36Z

databricks/koalas/indexes.py

+        idx_column = self._kdf._sdf.select(self._scol)
+        deduplicate_idx_column = idx_column.drop_duplicates()
+
+        if idx_column.count() == deduplicate_idx_column.count():


This runs Spark jobs twice. Can we use F.count and F.countDistinct instead?

ahh! Nice! thanks!!!

HyukjinKwon · 2019-10-28T09:23:18Z

databricks/koalas/indexes.py

+        col = scol.columns[0]
+
+        count = scol.agg(F.count(col).alias('count')).collect()
+        dedup_count = scol.agg(F.countDistinct(col).alias('count')).collect()


@charlesdong1991 can we do it in one pass?

return df.select(F.count("id") != F.countDistinct("id")).first()[0]

Nice, this is much simpler! thanks! @HyukjinKwon

softagram-bot · 2019-10-28T11:00:34Z

Softagram Impact Report for pull/946 (head commit: `6abbe02`)

⭐ Change Overview

(Open in Softagram Desktop for full details)

📄 Full report

Permalink: Full report for pull/946

Impact Report explained. Give feedback on this report to [email protected]

charlesdong1991 added 3 commits October 24, 2019 07:34

add has_duplicates

fdc0b35

fix typo

a231ca4

fix test

d7cd20d

recommit

05b2e3d

ueshin reviewed Oct 24, 2019

View reviewed changes

charlesdong1991 added 2 commits October 24, 2019 23:46

code change on reviews

af26aad

correct typo

b58b8ab

HyukjinKwon reviewed Oct 28, 2019

View reviewed changes

better pyspark

6abbe02

HyukjinKwon approved these changes Oct 29, 2019

View reviewed changes

HyukjinKwon merged commit 6f4c7fa into databricks:master Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add has_duplicates property for Index and MultiIndex #946

Add has_duplicates property for Index and MultiIndex #946

Uh oh!

charlesdong1991 commented Oct 23, 2019

Uh oh!

codecov-io commented Oct 24, 2019 •

edited

Loading

Uh oh!

ueshin Oct 24, 2019

Uh oh!

charlesdong1991 Oct 24, 2019

Uh oh!

HyukjinKwon Oct 28, 2019 •

edited

Loading

Uh oh!

charlesdong1991 Oct 28, 2019

Uh oh!

softagram-bot commented Oct 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add has_duplicates property for Index and MultiIndex #946

Add has_duplicates property for Index and MultiIndex #946

Uh oh!

Conversation

charlesdong1991 commented Oct 23, 2019

Uh oh!

codecov-io commented Oct 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ueshin Oct 24, 2019

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 Oct 24, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Oct 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charlesdong1991 Oct 28, 2019

Choose a reason for hiding this comment

Uh oh!

softagram-bot commented Oct 28, 2019

Softagram Impact Report for pull/946 (head commit: 6abbe02)

⭐ Change Overview

📄 Full report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-io commented Oct 24, 2019 •

edited

Loading

HyukjinKwon Oct 28, 2019 •

edited

Loading

Softagram Impact Report for pull/946 (head commit: `6abbe02`)