Introduction of RangeIndex #9961

ARF1 · 2015-04-21T14:05:03Z

RangeIndex(1, 10, 2) is a memory saving alternative to Index(np.arange(1, 10,2)): c.f. #939.

This re-implementation is compatible with the current Index() api and is a drop-in replacement for Int64Index(). It automatically converts to Int64Index() when required by operations.

At present only for a minimum number of operations the type is conserved (e.g. slicing, inner-, left- and right-joins). Most other operations trigger creation of an equivalent Int64Index (or at least an equivalent numpy array) and fall back to its implementation.

This PR also extends the functionality of the Index() constructor to allow creation of RangeIndexes() with

Index(20)
Index(2, 20)
Index(0, 20, 2)

in analogy to

range(20)
range(2, 20)
range(0, 20, 2)

…t64Index tests to be RangeIndex tests instead)

…he ndarray interface

…e filled...shocker)

jreback · 2015-04-21T14:13:16Z

almost all testing is now done in test_index.py
not precluding another file but most / all should be there (in a new section) inheriting from the base tester

ARF1 · 2015-04-21T16:45:06Z

@jreback Thanks. I am currently rewriting the test suite of Int64Index for RangeIndex.

Clearly instantiation will be different. Which of the following cases should be possible?

1. RangeIndex(0.5, 10, 0.1)
2. RangeIndex()  --> empty, might be useful to allow cheap resizing by setting start/stop property of index
3. RangeIndex(1, 1)
4. RangeIndex(np.nan)
5. RangeIndex([1, 2, 3]) --> I.e. Check if list is monotonically increasing and represent compactly if possible, fall-back to `Int64Index` otherwise?
6. RangeIndex(1, dtype=np.int4) --> relevant for `index.values` behaviour. Necessary? Can we restrict to int64?
7. Index(1) --> RangeIndex(1) ?

Should RangeIndex(5).values return np.arange(5) or something memory saving?

shoyer · 2015-04-21T18:11:06Z

I would not trouble yourself with allowing for non-integer or missing start, step or stop. Also, IMO arguments to RangeIndex should be parsed exactly like the built-in range (except for the optional name keyword argument).

shoyer · 2015-04-21T18:17:29Z

pandas/core/range.py

+    #   could *replace* itself on its parent. (tradeoff between instantiation time and
+    #   ability to gc values when they aren't needed anymore)
+    # TODO: Block setting of start and end
+    def __new__(cls, left, right=None, step=1, name=None):


I would follow the signature of the builtin range here:

class range(stop, *, name=None) class range(start, stop, step=1, *, name=None)

You'll need to parse args and kwargs manually to make this work.

shoyer · 2015-04-21T18:21:35Z

For implementation, I would strongly recommend keeping track of builtin range object (from Python 3 or a backport) under the hood. range already supports all the bounds checking and slicing logic.

ARF1 · 2015-04-22T10:46:56Z

I find that I frequently have to check whether the supplied dtype parameter it equivalent to np.int64. Since dtype can be string, etc, my current solution is this:

try:
    got_int64_dtype = isinstance(np.dtype(dtype), np.int64)
except TypeError:
    got_int64_dtype = False

Is there a converter function already available for use in the idiom isinstance(some_conv_fn(dtype), np.int64)?

@shoyer Thanks for the helpful advice and references.

jreback · 2015-04-22T11:10:19Z

this should use virtually the same __new__ as Int64Index. Use com.is_integer_dtype.

ARF1 · 2015-04-22T16:55:55Z

Ok, I am slowly making progress: I finally figured out how to do an "analytic" intersection (inner join) between two general, dissimilar RangeIndex of the type range(start, stop, step). Of course this results simply in a new RangeIndex with a new set of parameters.

Now I am a bit unsure on how to proceed with the union (outer join) of two RangeIndex. This can in general not be represented as a single range(start, stop, step) but in the worst case requires two ranges with different parameters.

One could imagine a CompoundRangeIndex that keeps a list of multiple RangeIndex.

I was planning to avoid converting RangeIndex to Int64Index as much as possible but at a certain point operations on a numpy array will probably become faster than running through as list of different RangeIndex contained in a CompoundRangeIndex...

What does everybody think? Is it worth the effort? Or shall I just convert to a Int64Index when a union is required and be done with it?

shoyer · 2015-04-22T16:58:14Z

I would just covert to Int64Index when necessary, possibly even for all union operations on RangeIndex objects (for the sake of consistency).

RangeIndex(1, 10, 2) is a memory saving alternative to Index(np.arange(1, 10,2)). This implementation is compatible with the current Index() api and is a drop-in replacement for Int64Index(). It automatically converts to Int64Index() when required by operations.

ARF1 · 2015-04-24T11:22:10Z

I am closing this in favor of #9977 to deal with merging issues.

jtratner added 8 commits November 7, 2013 22:46

Start of RangeIndex implementation

9739b2f

More work on RangeIndex (with cleaned up tests)

26b4e5c

add note on slice locs for non-monotonically increasing Index objects

c93f263

Flesh out RangeIndex test cases (including translating many of the In…

9f34a97

…t64Index tests to be RangeIndex tests instead)

Add specialized arrmap and groupby for RangeIndex to algos

5d3bb32

More work on the RangeIndex implementation

5e8dff4

tweaks to test code to get everything working right + block some of t…

5ba899d

…he ndarray interface

More fixes to backfill (you know, using something that can actually b…

a2c6ea3

…e filled...shocker)

ARF1 mentioned this pull request Apr 21, 2015

Add a more memory-efficient RangeIndex-sort of thing to avoid large arange(N) indexes in some cases #939

Closed

shoyer reviewed Apr 21, 2015
View reviewed changes

ARF added 5 commits April 23, 2015 20:14

Reset to master @ 76571d0

b4a80d1

fix python3: use floor division

4dab44c

protect properties, fastpath-ing, specialised slicing, size, __len__()

3d2e48d

fix rebasing issues

73871dc

ARF1 closed this Apr 24, 2015

ARF1 deleted the range_index branch April 24, 2015 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Introduction of RangeIndex #9961

Introduction of RangeIndex #9961

Uh oh!

ARF1 commented Apr 21, 2015

Uh oh!

jreback commented Apr 21, 2015

Uh oh!

ARF1 commented Apr 21, 2015

Uh oh!

shoyer commented Apr 21, 2015

Uh oh!

shoyer Apr 21, 2015

Uh oh!

shoyer commented Apr 21, 2015

Uh oh!

ARF1 commented Apr 22, 2015

Uh oh!

jreback commented Apr 22, 2015

Uh oh!

ARF1 commented Apr 22, 2015

Uh oh!

shoyer commented Apr 22, 2015

Uh oh!

ARF1 commented Apr 24, 2015

Uh oh!

Uh oh!

Uh oh!

Introduction of RangeIndex #9961

Introduction of RangeIndex #9961

Uh oh!

Conversation

ARF1 commented Apr 21, 2015

Uh oh!

jreback commented Apr 21, 2015

Uh oh!

ARF1 commented Apr 21, 2015

Uh oh!

shoyer commented Apr 21, 2015

Uh oh!

shoyer Apr 21, 2015

Choose a reason for hiding this comment

Uh oh!

shoyer commented Apr 21, 2015

Uh oh!

ARF1 commented Apr 22, 2015

Uh oh!

jreback commented Apr 22, 2015

Uh oh!

ARF1 commented Apr 22, 2015

Uh oh!

shoyer commented Apr 22, 2015

Uh oh!

ARF1 commented Apr 24, 2015

Uh oh!

Uh oh!