Skip to content

groupby breaks when using duplicated level names #21075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
guenteru opened this issue May 16, 2018 · 4 comments
Closed

groupby breaks when using duplicated level names #21075

guenteru opened this issue May 16, 2018 · 4 comments
Labels
Groupby MultiIndex Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@guenteru
Copy link

Hi,
I updated to version 0.23.0 and all of a sudden the following code breaks:

import pandas as pd
df = pd.DataFrame(data={'date': list(pd.date_range('5.1.2018', '5.10.2018')),
                        'vals': list(range(10))})
df.groupby([df.date.dt.month, df.date.dt.day])['vals'].sum()

ValueError: Duplicated level name: "date", assigned to level 1, is already used for level 0.

Expected output:

Using version 0.22.0 the same code yields the following:

date  date
5     1       0
      2       1
      3       2
      4       3
      5       4
      6       5
      7       6
      8       7
      9       8
      10      9
Name: vals, dtype: int64

It obviously contains duplicated level names. I get why this might be a problem, but as of version 0.23.0 it's not possible to specify the resulting level names.

@TomAugspurger
Copy link
Contributor

Thanks.

cc @WillAyd @toobaz if you have ideas.

FYI @guenteru we have release candidates, if you want to try things out and report things before hand. They're announced on the low-traffic pandas-dev mailing list: https://mail.python.org/mailman/listinfo/pandas-dev

@TomAugspurger TomAugspurger added Groupby Regression Functionality that used to work in a prior pandas version MultiIndex labels May 16, 2018
@TomAugspurger TomAugspurger added this to the 0.23.1 milestone May 16, 2018
@jorisvandenbossche
Copy link
Member

I suppose this is due to the change that MultiIndex level names now need to be unique: #18872 and #18882

This is a rather big break .. Temporary workaround for you can be to rename the series that is passed to groupby:

df.groupby([df.date.dt.month.rename('month'), df.date.dt.day.rename('day')])['vals'].sum()

@toobaz
Copy link
Member

toobaz commented May 16, 2018

Duplicate of #19029 I think?

@TomAugspurger
Copy link
Contributor

Closing in favor of #19029

guenteru added a commit to guenteru/pandas that referenced this issue May 21, 2018
As of version 0.23.0 MultiIndex throws an exception in case it contains
duplicated level names. This can happen as a result of various groupby
operations (pandas-dev#21075). This commit changes the behavior of groupby slightly: In
case there are duplicated names contained in the index these names get suffixed by there
corresonding position (i.e. [name,name] => [name0,name1])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby MultiIndex Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

4 participants