Skip to content

Weird assignment behaviour with MultiIndex #12343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mvds314 opened this issue Feb 16, 2016 · 3 comments
Closed

Weird assignment behaviour with MultiIndex #12343

mvds314 opened this issue Feb 16, 2016 · 3 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@mvds314
Copy link

mvds314 commented Feb 16, 2016

I came across some weird behavior when using assignment with a multi-index in Pandas 0.17.1

import pandas as pd
lvl0=range(2)
lvl1 = range(3)
index=pd.MultiIndex.from_product([lvl0,lvl1])
columns=['A','B']
df1=pd.DataFrame(data=1, columns=columns, index=index)
df2 = pd.DataFrame(data=2, index=lvl1, columns=['A'])

#this returns a dataframe without a multi index
print(df1.loc[0,df2.columns]) 

#but when you try an assignment it doesn't work
df1.loc[0,df2.columns]=df2
print(df1)

I debugged the pandas package a little bit and I think I know what is going. When you call df1.loc[0,df2.columns] it uses some getitem method. But, df1.loc[0,df2.columns]=df2 uses some setitem method. When you set a break point in indexing.py in _NDFrameIndexer._get_setitem_indexer, then you see that the key that is being looked up in the index of the DataFrame is (0,['A']). Obviously, this key cannot be found, since A is a column. The result is that nothing is assigned.

I think this is a bug because there is inconsistent behaviour between setitem and getitem: when df1.loc[0,df2.columns] returns a DataFrame without a multindex (which isn't a copy) then assignment should be possible.

@jreback
Copy link
Contributor

jreback commented Feb 16, 2016

xref #6699

Multiindex assignment is a bit buggy.

You can do this

In [47]: df1 = pd.DataFrame(data=1, columns=columns, index=index)

In [48]: df1
Out[48]: 
     A  B
0 0  1  1
  1  1  1
  2  1  1
1 0  1  1
  1  1  1
  2  1  1

In [49]: df1.loc[0,'A'] = df2.values

In [50]: df1
Out[50]: 
     A  B
0 0  2  1
  1  2  1
  2  2  1
1 0  1  1
  1  1  1
  2  1  1

I would expect this should work (and doesn't)

In [51]: df1 = pd.DataFrame(data=1, columns=columns, index=index)
In [53]: df2['A']                   
Out[53]: 
0    2
1    2
2    2
Name: A, dtype: int64

In [54]: df1.loc[0,'A']
Out[54]: 
0    1
1    1
2    1
Name: A, dtype: int64

In [55]: df1.loc[0,'A'] = df2['A']  

In [56]: df1
Out[56]: 
       A  B
0 0  NaN  1
  1  NaN  1
  2  NaN  1
1 0  1.0  1
  1  1.0  1
  2  1.0  1

however, not sure that assignment to a different shape should actually work, e.g. df2 is a dataframe. I would expect this to raise.

pull-requests are welcome to fix.

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Difficulty Intermediate labels Feb 16, 2016
@jreback jreback added this to the Next Major Release milestone Feb 16, 2016
@jreback
Copy link
Contributor

jreback commented Feb 16, 2016

furthermore, pls looks thru other multiindex assignment issues to see if you can find some related ones.

@mroeschke
Copy link
Member

Looks like this works now and I believe we have a test for this so closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

4 participants