Skip to content

REF: Refactor signature of RangeIndex._simple_new #26722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 14, 2019

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Jun 7, 2019

This PR refactors RangeIndex._simple_new, so its signature is the same as Index._simple_new. This will help typing later on, as currently mypy complains about the different signatures. In short a _simple_new now expects a range as its input. As a range is immutable, the code is easy to reason about.

Additionally, the signature of RangeIndex.from_range has dropped the **kwargs part of its signature. This makes the class method a bit easier to reason about. Other classes with a from_* class method don't have a **kwargs part.

After this PR, RangeIndex will be ready for adding type hints.

EDIT; Performance improvements:

>>> rng = pd.RangeIndex(1_000_000)
>>> sl = slice(0, 900_000)
>>> %timeit rng[sl]  # slicing
7.65 µs ± 8.11 ns per loop  # master
1.85 µs ± 8.11 ns per loop  # this PR

@pep8speaks
Copy link

pep8speaks commented Jun 7, 2019

Hello @topper-123! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-14 08:20:11 UTC

@topper-123 topper-123 force-pushed the align_simple_new_signatures branch from e2c1f3a to 65198ba Compare June 7, 2019 22:49
@topper-123 topper-123 added Index Related to the Index class or subclasses Refactor Internal refactoring of code labels Jun 7, 2019
@topper-123 topper-123 added this to the 0.25.0 milestone Jun 7, 2019
@topper-123 topper-123 force-pushed the align_simple_new_signatures branch 2 times, most recently from e6d9b4c to 1b34ac9 Compare June 8, 2019 07:18
@codecov
Copy link

codecov bot commented Jun 8, 2019

Codecov Report

Merging #26722 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26722      +/-   ##
==========================================
- Coverage   91.78%   91.77%   -0.01%     
==========================================
  Files         174      174              
  Lines       50703    50703              
==========================================
- Hits        46538    46534       -4     
- Misses       4165     4169       +4
Flag Coverage Δ
#multiple 90.37% <100%> (ø) ⬆️
#single 41.8% <55.55%> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.71% <100%> (-0.01%) ⬇️
pandas/core/indexes/range.py 98.82% <100%> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf25c5c...1b34ac9. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 8, 2019

Codecov Report

Merging #26722 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26722      +/-   ##
==========================================
- Coverage   91.86%   91.85%   -0.01%     
==========================================
  Files         179      179              
  Lines       50700    50702       +2     
==========================================
- Hits        46576    46574       -2     
- Misses       4124     4128       +4
Flag Coverage Δ
#multiple 90.45% <100%> (ø) ⬆️
#single 41.11% <56.25%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.71% <100%> (ø) ⬆️
pandas/core/indexes/range.py 98.82% <100%> (ø) ⬆️
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.88% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8374a1d...2bde25b. Read the comment docs.

@topper-123 topper-123 force-pushed the align_simple_new_signatures branch from 1b34ac9 to 40f566e Compare June 8, 2019 08:26
@jreback
Copy link
Contributor

jreback commented Jun 8, 2019

looks fine. any comments about this api change @jorisvandenbossche @TomAugspurger

@topper-123 topper-123 force-pushed the align_simple_new_signatures branch 3 times, most recently from 6f3471a to 0733747 Compare June 8, 2019 18:26
@topper-123
Copy link
Contributor Author

I've added a new commit that's improves the performance of slices RangeIndexes:

>>> rng = pd.RangeIndex(1_000_000)
>>> sl = slice(0, 900_000)
>>> %timeit rng[sl]  # slicing
7.65 µs ± 8.11 ns per loop  # master
1.85 µs ± 8.11 ns per loop  # this PR

>>> %timeit rng[sl.stop]  # scalar
640 ns ± 15.1 ns per loop  # master
652 ns ± 14.9 ns per loop  # this PR, the same speed

So a 4 x speed improvement for slicing.

@topper-123 topper-123 added the Performance Memory or execution speed performance label Jun 8, 2019
@topper-123 topper-123 force-pushed the align_simple_new_signatures branch from 0733747 to c6ce654 Compare June 8, 2019 19:12
@topper-123
Copy link
Contributor Author

Ping. Is this ok?

@jorisvandenbossche
Copy link
Member

Is the performance improvement linked to the change in behaviour?

@topper-123
Copy link
Contributor Author

topper-123 commented Jun 13, 2019

No, it is because of that second commit, where we postpone the more expensive is_scalar check. It could as such be in a different PR.

@jorisvandenbossche
Copy link
Member

Is there a good reason to change the init behaviour? (does it make the code less complex or does it make the life easier for users?)
If we don't have a compelling reason, I would rather leave that and just stick with the performance improvement.

@topper-123 topper-123 force-pushed the align_simple_new_signatures branch 3 times, most recently from 5359f6b to 6767ef0 Compare June 14, 2019 08:19
@topper-123 topper-123 force-pushed the align_simple_new_signatures branch from 6767ef0 to 2bde25b Compare June 14, 2019 08:20
@topper-123
Copy link
Contributor Author

I've reverted the possibility to use a range to initailze the RangeIndex.

Else the code is a simplification IMO by making the signature of _simple_new align with simple_new of the other the index types, which IMO help both on understanding and on typing.

@topper-123 topper-123 changed the title REF: Refactor signature of RangeIndex._simple_new, allow range in construction REF: Refactor signature of RangeIndex._simple_new Jun 14, 2019
@jreback jreback merged commit 137a886 into pandas-dev:master Jun 14, 2019
@jreback
Copy link
Contributor

jreback commented Jun 14, 2019

thanks @topper-123

@topper-123 topper-123 deleted the align_simple_new_signatures branch June 14, 2019 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Performance Memory or execution speed performance Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants