Skip to content

MultiIndex.is_monotonic_increasing #32179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xin-jin opened this issue Feb 22, 2020 · 9 comments
Open

MultiIndex.is_monotonic_increasing #32179

xin-jin opened this issue Feb 22, 2020 · 9 comments

Comments

@xin-jin
Copy link

xin-jin commented Feb 22, 2020

This does not seem to be documented anywhere. For MultiIndex, how is .is_monotonic_increasing different from .is_lexsorted?

@charlesdong1991
Copy link
Member

.is_lexsorted is used for checking sortedness of MI, while is_monotonic_increasing/decreasing for a regular Index

The DOC could be improved and better clarified

@xin-jin
Copy link
Author

xin-jin commented Feb 23, 2020

.is_lexsorted is used for checking sortedness of MI, while is_monotonic_increasing/decreasing for a regular Index

The DOC could be improved and better clarified

Thanks for the reply.

MultiIndex also has the attribute .is_monotonic_increasing. Does it mean anything or just not accurate?

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 27, 2020

Does it mean anything or just not accurate?

Given the bug you uncovered in #32259, at the moment I'd say it's more accurate :)

@MarcoGorelli
Copy link
Member

Does it mean anything or just not accurate?

Given the bug you uncovered in #32259, at the moment I'd say it's more accurate :)

Sorry, I think I was wrong. By looking at the docs, it says

Return True if the codes are lexicographically sorted.

Notice how it explicitly says codes rather than values.

So, I think #32259 is actually fine as it is (not a bug), because there the values are sorted. This clarifies the difference here - is_monotonic_increasing/decreasing check that the values are increasing / decreasing, while is_lexsorted just checks the codes.

The DOC could be improved and better clarified

Yes, IMO a note to clarify this difference would be good

@xin-jin
Copy link
Author

xin-jin commented Feb 28, 2020

Does it mean anything or just not accurate?

Given the bug you uncovered in #32259, at the moment I'd say it's more accurate :)

Sorry, I think I was wrong. By looking at the docs, it says

Return True if the codes are lexicographically sorted.

Notice how it explicitly says codes rather than values.

So, I think #32259 is actually fine as it is (not a bug), because there the values are sorted. This clarifies the difference here - is_monotonic_increasing/decreasing check that the values are increasing / decreasing, while is_lexsorted just checks the codes.

The DOC could be improved and better clarified

Yes, IMO a note to clarify this difference would be good

Does that mean MultiIndex.sortlevel will also only be sorting the codes, rather than values? This would be rather confusing and will not be useful for users ...

I am wondering in what cases codes have different order from values? sortlevel is critical to a software I am writing (alternatives like MultiIndex.sort_values() are extremely slow for my cases).

@MarcoGorelli
Copy link
Member

Does that mean MultiIndex.sortlevel will also only be sorting the codes, rather than values?

By looking at the source code, that seems to be the case.

This would be rather confusing and will not be useful for users ...

If I've understood is_lexsorted and sortlevel correctly, I agree.

Might want to keep an eye on the thread in #32312

@xin-jin
Copy link
Author

xin-jin commented Feb 28, 2020

I guess potentially in MultiIndex.sortlevel, it can check the MultiIndex.levels[i].is_monotonic_increasing on each of its level, and resort those MultiIndex.levels[i] that are not sorted (as well as resetting its corresponding codes).

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 28, 2020

sortlevel is critical to a software I am writing (alternatives like MultiIndex.sort_values() are extremely slow for my cases).

Would resetting the index (.reset_index) and then sorting the levels with .sort_values and then setting the index back again (.set_index) work as a temporary solution until this is sorted out?

@xin-jin
Copy link
Author

xin-jin commented Feb 29, 2020

sortlevel is critical to a software I am writing (alternatives like MultiIndex.sort_values() are extremely slow for my cases).

Would resetting the index (.reset_index) and then sorting the levels with .sort_values and then setting the index back again (.set_index) work as a temporary solution until this is sorted out?

Thanks. That would be a bit slow for our usage (our typical usage has 4 ~ 5 levels and 10m ~ 100m rows). What I end up doing now is to always check that all MultiIndex.levels[i].is_monotonic_increasing is True before doing sortlevel. It seems that unless the MultiIndex is intentionally set with unsorted levels (like what happens with groupby with sort=False), in all other cases the levels are indeed sorted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants