Skip to content

Conversation

@charlesdong1991
Copy link
Contributor

Since in new pandas, named aggregation for MultiIndex is working, so add this in this PR.

kdf = ks.DataFrame({"group": ['a', 'a', 'b', 'b'], "A": [0, 1, 2, 3], "B": [5, 6, 7, 8]})
kdf.columns = pd.MultiIndex.from_tuples([('x', 'group'), ('y', 'A'), ('y', 'B')])

Screen Shot 2019-11-04 at 10 46 48 PM

@codecov-io
Copy link

codecov-io commented Nov 4, 2019

Codecov Report

Merging #1000 into master will decrease coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1000      +/-   ##
==========================================
- Coverage   94.83%   94.78%   -0.06%     
==========================================
  Files          34       34              
  Lines        6527     6555      +28     
==========================================
+ Hits         6190     6213      +23     
- Misses        337      342       +5
Impacted Files Coverage Δ
databricks/koalas/groupby.py 91.41% <100%> (+0.01%) ⬆️
databricks/koalas/window.py 90.9% <0%> (-2.15%) ⬇️
databricks/koalas/generic.py 95.73% <0%> (-0.44%) ⬇️
databricks/koalas/missing/window.py 100% <0%> (ø) ⬆️
databricks/koalas/utils.py 98.02% <0%> (ø) ⬆️
databricks/koalas/missing/series.py 100% <0%> (ø) ⬆️
databricks/koalas/series.py 96.34% <0%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6de340d...a1f6e8e. Read the comment docs.

@softagram-bot
Copy link

Softagram Impact Report for pull/1000 (head commit: a1f6e8e)

⭐ Change Overview

Showing the changed files, dependency changes and the impact - click for full size
(Open in Softagram Desktop for full details)

📄 Full report

Impact Report explained. Give feedback on this report to [email protected]

@ueshin
Copy link
Collaborator

ueshin commented Nov 5, 2019

Hmm, pandas 0.25.2 doesn't support this yet?

>>> import pandas as pd
>>> pd.__version__
'0.25.2'
>>> pdf = pd.DataFrame({"group": ['a', 'a', 'b', 'b'], "A": [0, 1, 2, 3], "B": [5, 6, 7, 8]})
>>> pdf.columns = pd.MultiIndex.from_tuples([('x', 'group'), ('y', 'A'), ('y', 'B')])
>>> pdf
      x  y
  group  A  B
0     a  0  5
1     a  1  6
2     b  2  7
3     b  3  8
>>> pdf.groupby(('x', 'group')).agg(a_max=(('y', 'A'), "max")).sort_index()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 1455, in aggregate
    return super().aggregate(arg, *args, **kwargs)
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/groupby/generic.py", line 264, in aggregate
    result = result[order]
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/frame.py", line 3001, in __getitem__
    indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexing.py", line 1285, in _convert_to_indexer
    return self._get_listlike_indexer(obj, axis, **kwargs)[1]
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexing.py", line 1086, in _get_listlike_indexer
    indexer = ax.get_indexer_for(key)
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4817, in get_indexer_for
    return self.get_indexer(target, **kwargs)
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 2449, in get_indexer
    indexer = self._engine.get_indexer(target)
  File "pandas/_libs/index.pyx", line 648, in pandas._libs.index.BaseMultiIndexCodesEngine.get_indexer
  File "pandas/_libs/index.pyx", line 644, in pandas._libs.index.BaseMultiIndexCodesEngine._extract_level_codes
  File "/Users/ueshin/workspace/databricks-koalas/miniconda/envs/databricks-koalas_3.6_pd0.25/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 81, in _codes_to_ints
    codes <<= self.offsets
ValueError: operands could not be broadcast together with shapes (1,2) (3,) (1,2)

@charlesdong1991
Copy link
Contributor Author

charlesdong1991 commented Nov 5, 2019

@ueshin hmm, sorry, I might be wrong, i thought it starts supporting this from 0.25.2. Anyway I tested on current master, this is supported. Maybe we could leave this PR here for now.

BTW, 0.25.3 was released

@ueshin
Copy link
Collaborator

ueshin commented Nov 5, 2019

@charlesdong1991 I see, let's come back when the supporting version is released.

@HyukjinKwon
Copy link
Member

@charlesdong1991 oh, so 0.25.3 supports this?

@charlesdong1991
Copy link
Contributor Author

emm, not yet, sorry, I can confirm it is working on pandas master branch, but it is not released together with 0.25.x, it will probably be in 1.0 version @HyukjinKwon

will come back to this PR once it's ready in released version.

@HyukjinKwon
Copy link
Member

Ah, sure thanks for confirmation!

@Hoeze
Copy link

Hoeze commented Mar 29, 2020

Pandas 1.0 was released some time ago :)

@itholic
Copy link
Contributor

itholic commented Apr 16, 2020

This PR is now valid for the latest version of pandas (1.0.3).

>>> import pandas as pd
>>> pd.__version__
'1.0.3'
>>> pdf = pd.DataFrame({"group": ['a', 'a', 'b', 'b'], "A": [0, 1, 2, 3], "B": [5, 6, 7, 8]})
>>> pdf.columns = pd.MultiIndex.from_tuples([('x', 'group'), ('y', 'A'), ('y', 'B')])
>>> pdf
      x  y
  group  A  B
0     a  0  5
1     a  1  6
2     b  2  7
3     b  3  8
>>> pdf.groupby(('x', 'group')).agg(a_max=(('y', 'A'), "max")).sort_index()
            a_max
(x, group)
a               1
b               3

@charlesdong1991
I think we can merge this if the conflicts with master branch are resolved.

cc. @ueshin @HyukjinKwon WDYT?

@HyukjinKwon
Copy link
Member

@itholic, you can pick up this commits and creates new PR. I can credit to you and @charlesdong1991 as co-author.

@itholic
Copy link
Contributor

itholic commented Apr 20, 2020

@HyukjinKwon Okay, I will

HyukjinKwon pushed a commit that referenced this pull request May 4, 2020
This PR takes over #1000.

Closes #1000 

Co-authored-by: Kaiqi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants