Skip to content

Conversation

@charlesdong1991
Copy link
Contributor

No description provided.

@codecov-io
Copy link

codecov-io commented Oct 24, 2019

Codecov Report

Merging #946 into master will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #946      +/-   ##
==========================================
+ Coverage   94.49%   94.55%   +0.05%     
==========================================
  Files          34       34              
  Lines        6436     6486      +50     
==========================================
+ Hits         6082     6133      +51     
+ Misses        354      353       -1
Impacted Files Coverage Δ
databricks/koalas/missing/indexes.py 100% <ø> (ø) ⬆️
databricks/koalas/indexes.py 96.57% <100%> (+0.14%) ⬆️
databricks/koalas/internal.py 96.38% <0%> (ø) ⬆️
databricks/koalas/missing/frame.py 100% <0%> (ø) ⬆️
databricks/koalas/namespace.py 86.83% <0%> (ø) ⬆️
databricks/koalas/plot.py 94.28% <0%> (ø) ⬆️
databricks/koalas/missing/series.py 100% <0%> (ø) ⬆️
databricks/koalas/frame.py 96.03% <0%> (ø) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0df6453...6abbe02. Read the comment docs.

idx_column = self._kdf._sdf.select(self._scol)
deduplicate_idx_column = idx_column.drop_duplicates()

if idx_column.count() == deduplicate_idx_column.count():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs Spark jobs twice. Can we use F.count and F.countDistinct instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh! Nice! thanks!!!

col = scol.columns[0]

count = scol.agg(F.count(col).alias('count')).collect()
dedup_count = scol.agg(F.countDistinct(col).alias('count')).collect()
Copy link
Member

@HyukjinKwon HyukjinKwon Oct 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 can we do it in one pass?

return df.select(F.count("id") != F.countDistinct("id")).first()[0]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is much simpler! thanks! @HyukjinKwon

@softagram-bot
Copy link

Softagram Impact Report for pull/946 (head commit: 6abbe02)

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

📄 Full report

Impact Report explained. Give feedback on this report to [email protected]

@HyukjinKwon HyukjinKwon merged commit 6f4c7fa into databricks:master Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants