Skip to content

PERF: parse and timedelta ops improvements, #6755 #10396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 22, 2015
Merged

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jun 20, 2015

closes #6755

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
dtype_infer_timedelta64_1                    |   7.0527 | 119.7317 |   0.0589 |
timedelta_convert_string                     |  13.3094 | 108.9200 |   0.1222 |
timedelta_convert_string_seconds             |  18.2626 |  85.0617 |   0.2147 |
dtype_infer_timedelta64_2                    |   9.4047 |   9.4063 |   0.9998 |
timedelta_convert_int                        |   0.1300 |   0.1260 |   1.0315 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [da92dc0] : PERF: timedelta and datetime64 ops improvements
Base   [d8a2f30] : Check for size=0 before setting item
Fixes #10193

@jreback jreback added Performance Memory or execution speed performance Timedelta Timedelta data type labels Jun 20, 2015
@jreback jreback added this to the 0.17.0 milestone Jun 20, 2015
@jreback
Copy link
Contributor Author

jreback commented Jun 20, 2015

This is a cython conversion of the timedelta parsing routines. Was much more involved that it looked, as the original was using regexes, this is a string parsing state machine in cython.

However, I think this could actually be done in c (or maybe using char * pointers in cython), rather than the list build up. This could be another 10x perf improvement on this (my benchmark is the c parser for ISO datetimes).

if any of the c gurus are interested, pls comment!

@jreback jreback force-pushed the td branch 4 times, most recently from d7caf2c to 04bb24c Compare June 20, 2015 11:05
@jreback jreback force-pushed the td branch 2 times, most recently from d245111 to 5546947 Compare June 20, 2015 12:39
jreback added a commit that referenced this pull request Jun 22, 2015
PERF: parse and timedelta ops improvements, #6755
@jreback jreback merged commit af8eb59 into pandas-dev:master Jun 22, 2015
@@ -54,6 +54,9 @@ Removal of prior version deprecations/changes
Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- 4x improvement in ``timedelta`` string parsing (:issue:`6755`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_timedelta ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: improve to_timedelta perf for string-like
2 participants