-
Notifications
You must be signed in to change notification settings - Fork 367
Fix named aggregation for MultiIndex #1000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix named aggregation for MultiIndex #1000
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1000 +/- ##
==========================================
- Coverage 94.83% 94.78% -0.06%
==========================================
Files 34 34
Lines 6527 6555 +28
==========================================
+ Hits 6190 6213 +23
- Misses 337 342 +5
Continue to review full report at Codecov.
|
Softagram Impact Report for pull/1000 (head commit: a1f6e8e)⭐ Change Overview
📄 Full report
Impact Report explained. Give feedback on this report to [email protected] |
|
Hmm, pandas 0.25.2 doesn't support this yet? >>> import pandas as pd
>>> pd.__version__
'0.25.2'
>>> pdf = pd.DataFrame({"group": ['a', 'a', 'b', 'b'], "A": [0, 1, 2, 3], "B": [5, 6, 7, 8]})
>>> pdf.columns = pd.MultiIndex.from_tuples([('x', 'group'), ('y', 'A'), ('y', 'B')])
>>> pdf
x y
group A B
0 a 0 5
1 a 1 6
2 b 2 7
3 b 3 8
>>> pdf.groupby(('x', 'group')).agg(a_max=(('y', 'A'), "max")).sort_index()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 1455, in aggregate
return super().aggregate(arg, *args, **kwargs)
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 264, in aggregate
result = result[order]
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/frame.py", line 3001, in __getitem__
indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexing.py", line 1285, in _convert_to_indexer
return self._get_listlike_indexer(obj, axis, **kwargs)[1]
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexing.py", line 1086, in _get_listlike_indexer
indexer = ax.get_indexer_for(key)
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4817, in get_indexer_for
return self.get_indexer(target, **kwargs)
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 2449, in get_indexer
indexer = self._engine.get_indexer(target)
File "pandas/_libs/index.pyx", line 648, in pandas._libs.index.BaseMultiIndexCodesEngine.get_indexer
File "pandas/_libs/index.pyx", line 644, in pandas._libs.index.BaseMultiIndexCodesEngine._extract_level_codes
File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 81, in _codes_to_ints
codes <<= self.offsets
ValueError: operands could not be broadcast together with shapes (1,2) (3,) (1,2) |
|
@ueshin hmm, sorry, I might be wrong, i thought it starts supporting this from BTW, |
|
@charlesdong1991 I see, let's come back when the supporting version is released. |
|
@charlesdong1991 oh, so 0.25.3 supports this? |
|
emm, not yet, sorry, I can confirm it is working on pandas master branch, but it is not released together with will come back to this PR once it's ready in released version. |
|
Ah, sure thanks for confirmation! |
|
Pandas 1.0 was released some time ago :) |
|
This PR is now valid for the latest version of pandas (1.0.3). >>> import pandas as pd
>>> pd.__version__
'1.0.3'
>>> pdf = pd.DataFrame({"group": ['a', 'a', 'b', 'b'], "A": [0, 1, 2, 3], "B": [5, 6, 7, 8]})
>>> pdf.columns = pd.MultiIndex.from_tuples([('x', 'group'), ('y', 'A'), ('y', 'B')])
>>> pdf
x y
group A B
0 a 0 5
1 a 1 6
2 b 2 7
3 b 3 8
>>> pdf.groupby(('x', 'group')).agg(a_max=(('y', 'A'), "max")).sort_index()
a_max
(x, group)
a 1
b 3@charlesdong1991 cc. @ueshin @HyukjinKwon WDYT? |
|
@itholic, you can pick up this commits and creates new PR. I can credit to you and @charlesdong1991 as co-author. |
|
@HyukjinKwon Okay, I will |
This PR takes over #1000. Closes #1000 Co-authored-by: Kaiqi <[email protected]>

Since in new pandas, named aggregation for MultiIndex is working, so add this in this PR.