Skip to content

.join works incorrectly with MultiIndex (data is not actually alligned) #351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rsamson opened this issue Nov 8, 2011 · 2 comments
Closed
Labels
Milestone

Comments

@rsamson
Copy link

rsamson commented Nov 8, 2011

  • When you use .join with a MultiIndex, it combines the DataFrames improperly.
  • When you use .join(how='outer'), the index is destroyed (turned into integers).

Example below.
Thanks,
Ryan

index1=MultiIndex.from_arrays([['a','a','a','b','b','b'], [1,2,3,1,2,3]])

index2=MultiIndex.from_arrays([['b','b','b','c','c','c'], [1,2,3,1,2,3]])

index1

MultiIndex([('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3)], dtype=object)

index2

MultiIndex([('b', 1), ('b', 2), ('b', 3), ('c', 1), ('c', 2), ('c', 3)], dtype=object)

df1 = DataFrame(data=numpy.random.randn(6), index=index1, columns=['var X'])

df2 = DataFrame(data=numpy.random.randn(6), index=index2, columns=['var Y'])

df1=df1.sortlevel(0)

df2=df2.sortlevel(0)

df1

     var X 

a 1 0.8658
2 -0.9211
3 -0.6389
b 1 -0.6716
2 -0.6799
3 1.98

df2

 var Y 

b 1 -0.6978
2 -0.6186
3 0.117
c 1 -0.4577
2 0.7812
3 -1.691

df1.join(df2) #(a1, a2, a3 should be Nan for 'var Y', the values shown under a1, a2, a3 should appear under b1, b2, b3)

 var X   var Y 

a 1 0.8658 -0.6978
2 -0.9211 -0.6186
3 -0.6389 0.117
b 1 -0.6716 -0.4577
2 -0.6799 0.7812
3 1.98 -1.691

df1.join(df2, how='outer') #The index is destroyed, the index is not expanded, and the data is incorrectly alligned

var X var Y
0 0.8658 -0.6978
1 -0.9211 -0.6186
2 -0.6389 0.117
3 -0.6716 -0.4577
4 -0.6799 0.7812
5 1.98 -1.691

@wesm
Copy link
Member

wesm commented Nov 10, 2011

Thanks a lot for catching and reporting this. Fixed in the above commit, will be part of 0.5.1 release

@wesm wesm closed this as completed Nov 10, 2011
@rsamson
Copy link
Author

rsamson commented Nov 10, 2011

Great - thank you as always for all the great development work.

On Wed, Nov 9, 2011 at 10:17 PM, Wes McKinney <
[email protected]>wrote:

Thanks a lot for catching and reporting this. Fixed in the above commit,
will be part of 0.5.1 release


Reply to this email directly or view it on GitHub:
https://github.com/wesm/pandas/issues/351#issuecomment-2691013

dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019
…an_not_be_indexed

$size queries can't use indexes, use alternative queries.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants