Skip to content

Undesired auto sort when adding two multiindexes #8864

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Aurthes opened this issue Nov 20, 2014 · 6 comments
Closed

Undesired auto sort when adding two multiindexes #8864

Aurthes opened this issue Nov 20, 2014 · 6 comments

Comments

@Aurthes
Copy link

Aurthes commented Nov 20, 2014

idx0 =

timestamp sequence attribute
2014-11-12 17:02:23.635000-06:00 0 price
count
quantity

idx1 =

timestamp sequence attribute
2014-11-12 17:02:24.060000-06:00 1 price
count
quantity

idx0 + idx1 =

timestamp sequence attribute
2014-11-12 23:02:23.635000 0 count
price
quantity
2014-11-12 23:02:24.060000 1 count
price
quantity

Any way to solve this?

@TomAugspurger
Copy link
Contributor

Can you clean up your example to make it copy-pasteable into a REPL so that it's clearer what the problem is?

@Aurthes
Copy link
Author

Aurthes commented Nov 20, 2014

Hi,

Here's an example code:

idx1 = pd.MultiIndex.from_product([[1],[3],['c','b','a']])
idx2 = pd.MultiIndex.from_product([[2],[4],['f','e','d']])
print idx1
print idx2
idx = idx1 + idx2
print idx

You'll see how the 3rd level got sorted automatically. Is there a way I can prevent this?

@jreback
Copy link
Contributor

jreback commented Nov 20, 2014

@Aurthes you almost ALWAYS want the multi-index to sort to do any type of indexing. What is your usecase? (you can always do .sortlevel() to specifcally sort on a certain level(s))

@shoyer
Copy link
Member

shoyer commented Nov 21, 2014

@Aurthes In the current version of pandas, the + operator calculates the set union of two indexes (note that the alias + for union is currently deprecated).

If you simply want to concatenate two multi-indexes, you can do this with numpy:

idx3 = pd.MultiIndex.from_tuples(np.concatenate([idx1.values, idx2.values]))

To get the unique (non-sorted) elements (which I think is what you want here), you can then use .unique(), e.g.,

idx4 = pd.MultiIndex.from_tuples(idx3.unique())

This is slightly awkward, but pandas provides all the tools to do it efficiently, and I do believe this sort of use is pretty niche -- we have lots of shortcuts that make this easier when working with dataframes or series.

@Aurthes
Copy link
Author

Aurthes commented Nov 21, 2014

Problem solved, very efficient. Thanks! Pandas is great :)

@Aurthes
Copy link
Author

Aurthes commented Nov 21, 2014

Thx all !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants