Fix named aggregation for MultiIndex #1000

charlesdong1991 · 2019-11-04T21:47:06Z

Since in new pandas, named aggregation for MultiIndex is working, so add this in this PR.

kdf = ks.DataFrame({"group": ['a', 'a', 'b', 'b'], "A": [0, 1, 2, 3], "B": [5, 6, 7, 8]})
kdf.columns = pd.MultiIndex.from_tuples([('x', 'group'), ('y', 'A'), ('y', 'B')])

codecov-io · 2019-11-04T22:30:11Z

Codecov Report

Merging #1000 into master will decrease coverage by 0.05%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1000      +/-   ##
==========================================
- Coverage   94.83%   94.78%   -0.06%     
==========================================
  Files          34       34              
  Lines        6527     6555      +28     
==========================================
+ Hits         6190     6213      +23     
- Misses        337      342       +5

Impacted Files	Coverage Δ
databricks/koalas/groupby.py	`91.41% <100%> (+0.01%)`	⬆️
databricks/koalas/window.py	`90.9% <0%> (-2.15%)`	⬇️
databricks/koalas/generic.py	`95.73% <0%> (-0.44%)`	⬇️
databricks/koalas/missing/window.py	`100% <0%> (ø)`	⬆️
databricks/koalas/utils.py	`98.02% <0%> (ø)`	⬆️
databricks/koalas/missing/series.py	`100% <0%> (ø)`	⬆️
databricks/koalas/series.py	`96.34% <0%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6de340d...a1f6e8e. Read the comment docs.

softagram-bot · 2019-11-05T07:58:36Z

Softagram Impact Report for pull/1000 (head commit: `a1f6e8e`)

⭐ Change Overview

(Open in Softagram Desktop for full details)

📄 Full report

Permalink: Full report for pull/1000

Impact Report explained. Give feedback on this report to [email protected]

ueshin · 2019-11-05T21:58:58Z

Hmm, pandas 0.25.2 doesn't support this yet?

>>> import pandas as pd
>>> pd.__version__
'0.25.2'
>>> pdf = pd.DataFrame({"group": ['a', 'a', 'b', 'b'], "A": [0, 1, 2, 3], "B": [5, 6, 7, 8]})
>>> pdf.columns = pd.MultiIndex.from_tuples([('x', 'group'), ('y', 'A'), ('y', 'B')])
>>> pdf
      x  y
  group  A  B
0     a  0  5
1     a  1  6
2     b  2  7
3     b  3  8
>>> pdf.groupby(('x', 'group')).agg(a_max=(('y', 'A'), "max")).sort_index()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 1455, in aggregate
    return super().aggregate(arg, *args, **kwargs)
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 264, in aggregate
    result = result[order]
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/frame.py", line 3001, in __getitem__
    indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexing.py", line 1285, in _convert_to_indexer
    return self._get_listlike_indexer(obj, axis, **kwargs)[1]
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexing.py", line 1086, in _get_listlike_indexer
    indexer = ax.get_indexer_for(key)
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4817, in get_indexer_for
    return self.get_indexer(target, **kwargs)
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 2449, in get_indexer
    indexer = self._engine.get_indexer(target)
  File "pandas/_libs/index.pyx", line 648, in pandas._libs.index.BaseMultiIndexCodesEngine.get_indexer
  File "pandas/_libs/index.pyx", line 644, in pandas._libs.index.BaseMultiIndexCodesEngine._extract_level_codes
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 81, in _codes_to_ints
    codes <<= self.offsets
ValueError: operands could not be broadcast together with shapes (1,2) (3,) (1,2)

charlesdong1991 · 2019-11-05T22:06:56Z

@ueshin hmm, sorry, I might be wrong, i thought it starts supporting this from 0.25.2. Anyway I tested on current master, this is supported. Maybe we could leave this PR here for now.

BTW, 0.25.3 was released

ueshin · 2019-11-05T22:10:37Z

@charlesdong1991 I see, let's come back when the supporting version is released.

HyukjinKwon · 2019-11-06T02:44:47Z

@charlesdong1991 oh, so 0.25.3 supports this?

charlesdong1991 · 2019-11-06T11:14:52Z

emm, not yet, sorry, I can confirm it is working on pandas master branch, but it is not released together with 0.25.x, it will probably be in 1.0 version @HyukjinKwon

will come back to this PR once it's ready in released version.

HyukjinKwon · 2019-11-06T11:51:08Z

Ah, sure thanks for confirmation!

Hoeze · 2020-03-29T12:51:26Z

Pandas 1.0 was released some time ago :)

itholic · 2020-04-16T23:15:56Z

This PR is now valid for the latest version of pandas (1.0.3).

>>> import pandas as pd
>>> pd.__version__
'1.0.3'
>>> pdf = pd.DataFrame({"group": ['a', 'a', 'b', 'b'], "A": [0, 1, 2, 3], "B": [5, 6, 7, 8]})
>>> pdf.columns = pd.MultiIndex.from_tuples([('x', 'group'), ('y', 'A'), ('y', 'B')])
>>> pdf
      x  y
  group  A  B
0     a  0  5
1     a  1  6
2     b  2  7
3     b  3  8
>>> pdf.groupby(('x', 'group')).agg(a_max=(('y', 'A'), "max")).sort_index()
            a_max
(x, group)
a               1
b               3

@charlesdong1991
I think we can merge this if the conflicts with master branch are resolved.

cc. @ueshin @HyukjinKwon WDYT?

HyukjinKwon · 2020-04-20T10:04:38Z

@itholic, you can pick up this commits and creates new PR. I can credit to you and @charlesdong1991 as co-author.

itholic · 2020-04-20T11:12:38Z

@HyukjinKwon Okay, I will

This PR takes over #1000. Closes #1000 Co-authored-by: Kaiqi <[email protected]>

Fix named aggregation for MultiIndex

e6722f8

skip if lower than 0.25.2

a1f6e8e

itholic mentioned this pull request Apr 21, 2020

Fix named aggregation for MultiIndex #1435

Merged

HyukjinKwon closed this in #1435 May 4, 2020

HyukjinKwon pushed a commit that referenced this pull request May 4, 2020

Fix named aggregation for MultiIndex (#1435)

caf8bb5

This PR takes over #1000. Closes #1000 Co-authored-by: Kaiqi <[email protected]>

Fix named aggregation for MultiIndex #1000

Fix named aggregation for MultiIndex #1000

Uh oh!

Conversation

charlesdong1991 commented Nov 4, 2019

Uh oh!

codecov-io commented Nov 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

softagram-bot commented Nov 5, 2019

Softagram Impact Report for pull/1000 (head commit: a1f6e8e)

⭐ Change Overview

📄 Full report

Uh oh!

ueshin commented Nov 5, 2019

Uh oh!

charlesdong1991 commented Nov 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ueshin commented Nov 5, 2019

Uh oh!

HyukjinKwon commented Nov 6, 2019

Uh oh!

charlesdong1991 commented Nov 6, 2019

Uh oh!

HyukjinKwon commented Nov 6, 2019

Uh oh!

Hoeze commented Mar 29, 2020

Uh oh!

itholic commented Apr 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Apr 20, 2020

Uh oh!

itholic commented Apr 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codecov-io commented Nov 4, 2019 •

edited

Loading

Softagram Impact Report for pull/1000 (head commit: `a1f6e8e`)

charlesdong1991 commented Nov 5, 2019 •

edited

Loading

itholic commented Apr 16, 2020 •

edited

Loading