-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Regr/period range large value/issue 36430 #36535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regr/period range large value/issue 36430 #36535
Conversation
591f209
to
2c3c15d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for root causing - pretty tricky
doc/source/whatsnew/v1.1.3.rst
Outdated
@@ -33,9 +33,10 @@ Fixed regressions | |||
- Fixed regression in :class:`IntegerArray` unary plus and minus operations raising a ``TypeError`` (:issue:`36063`) | |||
- Fixed regression in :meth:`Series.__getitem__` incorrectly raising when the input was a tuple (:issue:`35534`) | |||
- Fixed regression in :meth:`Series.__getitem__` incorrectly raising when the input was a frozenset (:issue:`35747`) | |||
- Fixed regression in :meth:`read_excel` with ``engine="odf"`` caused ``UnboundLocalError`` in some cases where cells had nested child nodes (:issue:`36122`,:issue:`35802`) | |||
- Fixed regression in :class:`DataFrame` and :class:`Series` comparisons between numeric arrays and strings (:issue:`35700`,:issue:`36377`) | |||
- Fixed regression in :meth:`read_excel` with ``engine="odf"`` caused ``UnboundLocalError`` in some cases where cells had nested child nodes (:issue:`36122`, :issue:`35802`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert this? Minor nit but it's confusing to include here; can just do a separate PR to clean whatsnew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing.
pandas/_libs/tslibs/period.pyx
Outdated
@@ -886,7 +886,10 @@ cdef int64_t get_time_nanos(int freq, int64_t unix_date, int64_t ordinal) nogil: | |||
# We must have freq == FR_HR | |||
factor = 10**9 * 3600 | |||
|
|||
sub = ordinal - unix_date * 24 * 3600 * 10**9 / factor | |||
# Fix issue #36430 | |||
nanos_in_day = 24 * 3600 * 10**9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the appropriate suffixes to the constants here? This seems suspect that it would make a difference at all; wonder if the suffixes alone would fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean declaring this way: cdef const nanos_in_day = 24 * 3600 * 10**9
?
This does not compile.
looks good ping when addressed @WillAyd comments and green. |
@@ -886,7 +886,10 @@ cdef int64_t get_time_nanos(int freq, int64_t unix_date, int64_t ordinal) nogil: | |||
# We must have freq == FR_HR | |||
factor = 10**9 * 3600 | |||
|
|||
sub = ordinal - unix_date * 24 * 3600 * 10**9 / factor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the trouble is that there is an overflow going on somewhere in this expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I am fairly certain that 24 * 3600 * 10**9
will overflow - these are likely interpreted by the compiler to just be of type int, but that multiplication could very well exceed the limits of an int type. Adding the ULL suffix I think would be ideal
More details on how decimal literals are assigned types here:
https://stackoverflow.com/a/41407498/621736
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using uint64_t instead of int64_t will work for the given example, but will then fail for date range earlier than the epoch with an integer overflow, so we must stick with signed integer here.
About how this change work, it just change the order of the operation so that unix_date is not multiplied by 24 * 3600 * 10**9
, but by 24 * 3600 * 10**9 / factor
, which is smaller and does not result into an integer overflow (except for value in the really far futur for the use case described in the issue, after the year 2*10**15
)
So the real fix to do here is maybe just to add parenthesis in the right place, see new commit shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [64]: 24 * 3600 * 10**9
Out[64]: 86400000000000
In [65]: np.iinfo(np.int64).max
Out[65]: 9223372036854775807
this is a fairly standard number, i agree if multiplied by a large number this could overflow, but ok here.
2c3c15d
to
f0e1e27
Compare
@WillAyd anything further? |
Great thanks @nrebena - nice PR! |
@meeseeksdev backport 1.1.x |
Co-authored-by: nrebena <[email protected]>
Checklist
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Solution
Culprit was the multiplication
unix_date * 24 * 3600 * 10**9 / factor
, forThat probably lead to an integer overflow somewhere and the observed behaviours.
Splitting the multiplication did the trick.