Skip to content

CLN/PERF: simplify tslib.get_time_micros #18389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 24, 2017

Conversation

jbrockmendel
Copy link
Member

tslib.get_time_micros implementation is way more complicated than it needs to be. Instead of calling dt64_to_dtstruct in a loop, this changes the function to one call to np.mod.

AFAIK the np.mod call is to the python function, not sure if numpy exposes a cython version of mod. Then again, the function is only used once in indexes.datetimes and isn't all that perf sensitive. It might not need to be in cython in the first place.

Re-wrote inaccurate docstring.

@gfyoung gfyoung added Clean Performance Memory or execution speed performance labels Nov 20, 2017
@gfyoung
Copy link
Member

gfyoung commented Nov 20, 2017

asv benchmarks as usual before we can merge this.

@jbrockmendel
Copy link
Member Author

asv benchmarks as usual before we can merge this.

Since its just one function, how about just a timeit result

0.21.0

In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: np.random.seed(5678)
In [4]: arr = (np.random.randn(10000) * 1e7).astype(np.int64)
In [5]: %timeit res = tslib.get_time_micros(arr)
1000 loops, best of 3: 322 µs per loop

PR:

In [5]: %timeit res = tslib.get_time_micros(arr)
1000 loops, best of 3: 238 µs per loop

@codecov
Copy link

codecov bot commented Nov 20, 2017

Codecov Report

Merging #18389 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18389      +/-   ##
==========================================
- Coverage   91.35%   91.34%   -0.02%     
==========================================
  Files         163      163              
  Lines       49695    49695              
==========================================
- Hits        45401    45392       -9     
- Misses       4294     4303       +9
Flag Coverage Δ
#multiple 89.14% <100%> (ø) ⬆️
#single 39.66% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 95.49% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6eac0b...26ec1f3. Read the comment docs.

@jorisvandenbossche jorisvandenbossche changed the title [CLN/PERF] simplify tslib.get_time_micros CLN/PERF: simplify tslib.get_time_micros Nov 20, 2017
micros[i] = 1000000LL * (dts.hour * 60 * 60 +
60 * dts.min + dts.sec) + dts.us

micros = np.mod(dtindex, 86400000000000, dtype=np.int64) // 1000LL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure we have a constant somewhere rather than hard-coding 86400.....

also shouldn't this routine be in conversion?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conversion would be reasonable. fields might be a better fit. I'm actually leaning towards "this doesn't need to be a function" since its a one-linear and only used once.

The constant you're thinking of is defined in conversion.

@jreback
Copy link
Contributor

jreback commented Nov 23, 2017

needs an update according to comments

@jbrockmendel
Copy link
Member Author

No thoughts on the "just put this one-liner in the one place its used" idea?

cdef:
ndarray[int64_t] micros

micros = np.mod(dtindex, 86400000000000, dtype=np.int64) // 1000LL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we have this constant defined already somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, it’s defined in conversion. But it’s all the same to the compiler right? If it was something that was liable to change and needed to be kept in sync (eg nat_strings) it would make sense to cimport it. But for a constant like this it isn’t worth the hassle; it’d be like defining DAYSPERWEEK=7 and importing that all over the place.

@jreback
Copy link
Contributor

jreback commented Nov 23, 2017

No thoughts on the "just put this one-liner in the one place its used" idea?

yeah, but rather actually have these defined in a single place (rather than scattered about), so in same vein could move _to_time_micros here as well.

@jbrockmendel
Copy link
Member Author

yeah, but rather actually have these defined in a single place (rather than scattered about),

I'm with you on the non-scattering. Anyway, moved this to fields a few minutes ago.

@jreback jreback added this to the 0.22.0 milestone Nov 24, 2017
@jreback jreback merged commit aec3347 into pandas-dev:master Nov 24, 2017
@jreback
Copy link
Contributor

jreback commented Nov 24, 2017

ok, let's add to list to remove need for this

@jbrockmendel jbrockmendel deleted the tslibs-time_micros branch November 24, 2017 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants