Skip to content

Conversation

@RainFung
Copy link
Contributor

@RainFung RainFung commented Aug 5, 2019

This PR adds rank GroupBy (both SeriesGroupBy and DataFrameGroupBy) by using existing diff logic in Series.

@codecov-io
Copy link

codecov-io commented Aug 5, 2019

Codecov Report

Merging #622 into master will increase coverage by 0.08%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #622      +/-   ##
==========================================
+ Coverage   92.79%   92.87%   +0.08%     
==========================================
  Files          31       31              
  Lines        5008     5023      +15     
==========================================
+ Hits         4647     4665      +18     
+ Misses        361      358       -3
Impacted Files Coverage Δ
databricks/koalas/missing/groupby.py 100% <ø> (ø) ⬆️
databricks/koalas/series.py 92.73% <100%> (+0.02%) ⬆️
databricks/koalas/groupby.py 80.6% <100%> (+0.92%) ⬆️
databricks/koalas/frame.py 94.63% <0%> (+0.07%) ⬆️
databricks/conftest.py 97.67% <0%> (+2.32%) ⬆️
databricks/koalas/__init__.py 84.21% <0%> (+2.63%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e141bf7...06c7f56. Read the comment docs.

@HyukjinKwon
Copy link
Member

Looks fine in general.

@softagram-bot
Copy link

Softagram Impact Report for pull/622 (head commit: 06c7f56)

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

📄 Full report

Give feedback on this report to [email protected]

@HyukjinKwon HyukjinKwon merged commit 73e045d into databricks:master Aug 9, 2019
@HyukjinKwon
Copy link
Member

Thanks, @RainFung for working on this.

@HyukjinKwon HyukjinKwon mentioned this pull request Aug 9, 2019
@RainFung RainFung deleted the groupby.diff branch August 13, 2019 01:53
@ppakawatk
Copy link

ppakawatk commented Oct 28, 2019

Hi,

I'm trying to convert my Pandas script to Koalas script, and found this error when trying to do GroupBy.diff with datetime column.

Here's the code to reproduce the case:
df = ks.DataFrame({'a': [1, 2, 3, 4, 5, 6], 'b': [1, 1, 2, 3, 5, 8], 'c': [dt.date(2019, 1, 1), dt.date(2019, 1, 3), dt.date(2019, 1, 10), dt.date(2019, 1, 11), dt.date(2019, 1, 20), dt.date(2019, 1, 21)]}, columns=['a', 'b', 'c'])

df.groupby(['b'])['c'].diff()

Referring to https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.groupby.GroupBy.diff.html
I assume that right now it doesn't support datetime column when diff?

PS. here is the Pandas script which is working properly.
df = pd.DataFrame({'a': [1, 2, 3, 4, 5, 6], 'b': [1, 1, 2, 3, 5, 8], 'c': [dt.date(2019, 1, 1), dt.date(2019, 1, 3), dt.date(2019, 1, 10), dt.date(2019, 1, 11), dt.date(2019, 1, 20), dt.date(2019, 1, 21)]}, columns=['a', 'b', 'c'])

df.groupby(['b'])['c'].diff()

@HyukjinKwon
Copy link
Member

Can you open an issue with runnable examples? Seems like it's also related with Calender internal type support (SPARK-24695)

@ppakawatk
Copy link

Can you open an issue with runnable examples? Seems like it's also related with Calender internal type support (SPARK-24695)

Yes, sir.
#965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants