Skip to content

Support pad/backfill/nearest reindexing even for unsorted indexes by storing a sorted index? #9510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shoyer opened this issue Feb 18, 2015 · 2 comments
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance

Comments

@shoyer
Copy link
Member

shoyer commented Feb 18, 2015

Recently, I've been working on adding a 'nearest' method to reindexing: #9258

It occurs to me that we could easily extend reindexing/get_indexer methods to work with unordered indexes if we were willing to do a sort operation on the index if necessary. This would probably entail saving the sorted result on the parent index, similarly to how get_indexer is currently supported on MultiIndex by creating a tuple index internally.

I think this would be a nice usability gain over the current implementation, and not be too surprising. Sorting indexes (once) is pretty fast, for anything up to millions of rows.

Thoughts?

@shoyer shoyer added Indexing Related to indexing on series/frames, not to indexes themselves API Design labels Feb 18, 2015
@jreback
Copy link
Contributor

jreback commented Feb 18, 2015

xref is #3539

you could do this in an index, as we can easily store meta data, but not currently for a column (that would require a meta dict in the BlockManager).

Note have to be sure to invalidate these on any setting operations.

Note that IIRC the index cython object DOES cache these things, e.g. is_monotonic, so not sure how much this would actually change it. (I am talking about if the index IS sorted, then you don't have to recompute it again). This issue is different though.

@jreback jreback added the Performance Memory or execution speed performance label Feb 18, 2015
@TomAugspurger TomAugspurger added this to the Next Major Release milestone Jul 8, 2017
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@mroeschke
Copy link
Member

Looks like this never took off so closing. Can reopen if there's interest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

5 participants