Skip to content

DataFrame.melt fails if there are duplicate value_vars #41951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
swenlaur opened this issue Jun 11, 2021 · 4 comments · Fixed by #41964
Closed

DataFrame.melt fails if there are duplicate value_vars #41951

swenlaur opened this issue Jun 11, 2021 · 4 comments · Fixed by #41964
Labels
Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@swenlaur
Copy link

DataFrame.columns.get_indexer fails if there are duplicate column names.

idx = frame.columns.get_indexer(id_vars + value_vars)

This is really unfortunate -- melting should handle such cases

id value value
id1 2 3

should go to

id1 2
id1 3

@simonjayhawkins
Copy link
Member

Thanks @swenlaur for the report.

Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

and can you use the bug template that is available when you open an issue from the GitHub web interface.

@simonjayhawkins simonjayhawkins added Needs Info Clarification about behavior needed to assess issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 11, 2021
@swenlaur
Copy link
Author

swenlaur commented Jun 11, 2021

Minimal code to trigger the bug (regression from earlier versions)

from pandas import DataFrame
DataFrame([['id', 2, 3]]).set_axis(['id_var', 'value_var', 'value_var'], axis=1).melt(id_vars=['id_var'], value_vars=['value_var'])

The resulting data frame contains columns with the same name. This is not illegal as sometimes uncleaned data in this form and the most sane way to clean it is using melt.

@simonjayhawkins
Copy link
Member

regression from earlier versions

will add the regression tag for now pending investigation. posting more info on versions could help

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Jun 11, 2021
@swenlaur
Copy link
Author

I do not know the highest package number but repro works with pandas=1.0.0. I guess it stoped working after the multi-level colums patch was added to the code.

@lithomas1 lithomas1 removed the Needs Info Clarification about behavior needed to assess issue label Jun 12, 2021
@jreback jreback added this to the 1.3 milestone Jun 15, 2021
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants