Skip to content

groupby on index and columns mixed: strange behaviour #17681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
r7ar7a opened this issue Sep 26, 2017 · 4 comments
Closed

groupby on index and columns mixed: strange behaviour #17681

r7ar7a opened this issue Sep 26, 2017 · 4 comments
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions

Comments

@r7ar7a
Copy link

r7ar7a commented Sep 26, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
df1 = pd.DataFrame([[1, 2, 3]], columns=['a', 'b', 'c']).set_index('a')
df2 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c']).set_index('a')
df3 = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['a', 'b', 'c']).set_index('a')

print(df1.groupby(['a', 'c']).sum())
print('==============')
print(df2.groupby(['a', 'c']).sum())
print('==============')
print(df3.groupby(['a', 'c']).sum())
# output:
#    b
# a c   
# 1 3  2
# ==============
#    b  c
# a  2  3
# c  5  6
# ==============
#      b
# a c   
# 1 3  2
# 4 6  5
# 7 9  8

Problem description

This is a problem because the output structure depends on the number of rows in a dataframe. I figured out that this issue can be solved by calling reset_index() on the dataframe before the groupby, but this issue has caused unexpected troubles for our production framework.
Here the only exception is when the input dataframe has 2 rows.

Expected Output

     b
a c   
1 3  2
==============
     b
a c   
1 3  2
4 6  5
==============
     b
a c   
1 3  2
4 6  5
7 9  8

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.1.3
pip: 9.0.1
setuptools: 33.1.1.post20170320
Cython: 0.26
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: 0.7.9.None
psycopg2: 2.7.3 (dt dec pq3 ext lo64)
jinja2: 2.9.5
s3fs: None
pandas_gbq: None
pandas_datareader: None

@sinhrks
Copy link
Member

sinhrks commented Sep 27, 2017

When the key length matches with axis, pandas regards it as group labels. I think we have the same issue but couldn't find...

https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby.py#L2651

@mroeschke
Copy link
Member

Looks to be fixed on master. Could use a test.

In [145]: import pandas as pd
     ...: df1 = pd.DataFrame([[1, 2, 3]], columns=['a', 'b', 'c']).set_index('a')
     ...: df2 = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['a', 'b', 'c']).set_index('a')
     ...: df3 = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['a', 'b', 'c']).set_index('a')
     ...:
     ...: print(df1.groupby(['a', 'c']).sum())
     ...: print('==============')
     ...: print(df2.groupby(['a', 'c']).sum())
     ...: print('==============')
     ...: print(df3.groupby(['a', 'c']).sum())
     b
a c
1 3  2
==============
     b
a c
1 3  2
4 6  5
==============
     b
a c
1 3  2
4 6  5
7 9  8

In [146]: pd.__version__
Out[146]: '0.26.0.dev0+682.g08ab156eb'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Groupby labels Oct 26, 2019
@mjarosie
Copy link
Contributor

mjarosie commented Nov 2, 2019

I'd like to pick this up. I'll fill in a PR soon : )

@WillAyd
Copy link
Member

WillAyd commented Dec 17, 2019

covered in #29124

@WillAyd WillAyd closed this as completed Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants