-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix Timestamp type checks to work with subclassed datetime (#25851) #25853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix Timestamp type checks to work with subclassed datetime (#25851) #25853
Conversation
Hello @ArtificialQualia! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-04-05 15:33:46 UTC |
Codecov Report
@@ Coverage Diff @@
## master #25853 +/- ##
===========================================
- Coverage 91.45% 41.81% -49.64%
===========================================
Files 172 172
Lines 52892 52892
===========================================
- Hits 48373 22119 -26254
- Misses 4519 30773 +26254
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25853 +/- ##
==========================================
- Coverage 40.85% 40.72% -0.13%
==========================================
Files 175 175
Lines 52552 52551 -1
==========================================
- Hits 21468 21401 -67
- Misses 31084 31150 +66
Continue to review full report at Codecov.
|
below comments identify areas where _Timestamp has changed to prevent having to have circular imports. |
@jbrockmendel separated out _Timestamp as you suggested. Obviously had to make a number of other changes to support that, let me know if they are acceptable, or what other changes you'd like. Didn't mean to make most of those comments 'reviews', but they do need to be reviewed anyways I suppose! |
pandas/_libs/tslibs/timestamps.pyx
Outdated
Helper to create a Timedelta so that the | ||
Timedelta class doesn't have to be imported elsewhere | ||
""" | ||
return Timedelta(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather just have the runtime import in TImedelta.resolution
(I think the only place this is used). Or for that matter just move resolution
from _Timestamp
to Timestamp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in two places in _Timestamp
, _Timestamp.__sub__
and _Timestamp.resolution
. If we want to go down the runtime import path, we could either add the Timedelta
import in both functions in _Timestamp
, or add a _Timestamp
import into timedeltas._binary_op_method_timedeltalike
As for moving the _Timestamp
functions to Timestamp
, _Timestamp.resolution
can be moved with no issues. However, _Timestamp.__sub__
can not. Not sure where the problem is there, but if you want I can try to figure that one out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right that's what I mean, it doens't need to be in pandas/_libs/tslibs/__init__.py
, rather just import it directly where needed. its ok where its living.
pandas/_libs/tslibs/_timestamp.pyx
Outdated
elif PyDelta_Check(other): | ||
nanos = (other.days * 24 * 60 * 60 * 1000000 + | ||
other.seconds * 1000000 + | ||
other.microseconds) * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, this comment got dropped again due to 'review'. I've ensured this was the last missing comment I had.
As comment above implies, this logic was copied from delta_to_nanoseconds
, has also been cleaned up of a few unnecessary if
s.
delta_to_nanoseconds
is a very independent function though, and if we want to move that to a separate file that would work too.
@ArtificialQualia can you merge master and rename _timestamp.pyx -> c_timestamp.pyx, otherwise lgtm. |
Master merged and file renamed. Build seems to be failing because of google's cloud API? Not sure what that is about. I think this merge might want to wait on #25938 so that function can be imported and used rather than duplicating logic in |
@ArtificialQualia yeah don't worry about the gbq failure. ok will have you rebase after # 25938 |
@ArtificialQualia if you'd merge master; ping on green. |
@jreback merged master and green |
pandas/_libs/tslibs/__init__.py
Outdated
@@ -8,3 +8,4 @@ | |||
from .timestamps import Timestamp | |||
from .timedeltas import delta_to_nanoseconds, ints_to_pytimedelta, Timedelta | |||
from .tzconversion import tz_convert_single | |||
from .c_timestamp import maybe_integer_op_deprecated # isort:skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need this function here? this should not be exposed to the rest of the codebase (or directly imported if it actually is needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just expose it in timestamp.pyx itself to make it easier to import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was exposed previously, and is only used in pandas/core/arrays/datetimelike.py
. _Timestamp
itself makes use of it, so it has to be in c_timestamp.pyx
otherwise we would have a circular import.
I can change the import in datetimelike.py
to directly import it if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry made the comment in the wrong place. yes just import it directly in dateimelike.py
and don't put it in tslibs/__init__.py
int64_t value, nanosecond | ||
object freq | ||
list _date_attributes | ||
cpdef bint _get_start_end_field(self, str field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future PR we should rename the 'private' functions to public if they are actually exposed (the _ leading ones)
lgtm. ping on green! |
@jreback green |
thanks @ArtificialQualia nice patch! keep em coming! |
Still a -1 to backport this to 0.24.x @jreback? My coworkers could use this patch to upgrade to 0.24 as the regression was introduced recently. |
git diff upstream/master -u -- "*.py" | flake8 --diff
PyDateTime_CheckExact is being used as a proxy for isinstance(obj, Timestamp) checks for performance reasons. However, if a subclass of the std datetime lib is being used, these checks are not sufficient to determine if an object is a Timestamp or not.
As discussed in #25746, any solution will be less performant than PyDateTime_CheckExact. The best solution found was to use isinstance(obj, _Timestamp), and that is what I implemented here.
Of course, a few additional changes were necessary to be able to use that check.
Since asv isn't very good for cython code, I've done performance testing on all the modified functions by running them 10,000 times in a function, then timing that function in
%timeit
. The results are below:As you can see, some of the functions used with a
Timestamp
are affected, with a few unaffected. Where there is an impact, it appears to be 15-20% less performant.Note this fix also fixes (potentially) #25734, but if we want a separate test case for that issue I could modify my existing PR for that.