Skip to content

BUG: fix sort_index AssertionError with RangeIndex and level parameter#64386

Open
repeating wants to merge 2 commits intopandas-dev:mainfrom
repeating:fix-sort-index-rangeindex-level
Open

BUG: fix sort_index AssertionError with RangeIndex and level parameter#64386
repeating wants to merge 2 commits intopandas-dev:mainfrom
repeating:fix-sort-index-rangeindex-level

Conversation

@repeating
Copy link

closes #64383

RangeIndex.sort_values(return_indexer=True) was returning a RangeIndex object as the indexer, but the block manager (managers.py) asserts isinstance(indexer, np.ndarray) before calling .take(). this caused sort_index(level=...) to crash with AssertionError any time the index stayed a RangeIndex (e.g. an already-sorted range index, or a default index with .names set).

the base class Index.sort_values always returns np.arange(..., dtype=np.intp) as the indexer — RangeIndex.sort_values was overriding for performance but breaking that contract. the fix is to return np.arange(len(self), dtype=np.intp) (or the reversed equivalent) instead of RangeIndex(rng).

added a regression test covering both reported cases, and a whatsnew entry under v3.0.2.

rng = range(len(self))
return sorted_index, RangeIndex(rng)
indexer = np.arange(len(self), dtype=np.intp)
return sorted_index, indexer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the consequences of this? knowing you have a RangeIndex can be good for performance downstream

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the indexer here is only consumed by self._mgr.take(indexer, ...) which asserts isinstance(indexer, np.ndarray) — nothing downstream of sort_values(return_indexer=True) uses the indexer as a RangeIndex, it's purely a positional array for the block manager's take path. the base class Index.sort_values already returns np.ndarray for this, so this just brings the RangeIndex override in line with that contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.sort_index does fails with AssertionError

2 participants