Skip to content

Adding big offset to timedelta generates a python crash #14080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
geoffroy-destaintot opened this issue Aug 24, 2016 · 10 comments · Fixed by #23762
Closed

Adding big offset to timedelta generates a python crash #14080

geoffroy-destaintot opened this issue Aug 24, 2016 · 10 comments · Fixed by #23762
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Timedelta Timedelta data type
Milestone

Comments

@geoffroy-destaintot
Copy link

Code Sample, a copy-pastable example if possible

In:
import pandas as pd
from pandas.tseries.frequencies import to_offset

d = pd.Timestamp("2000/1/1")
d + to_offset("D")*100**25
Out:

=> python crash

Fatal Python error: Cannot recover from stack overflow.

Current thread 0x00002b00 (most recent call first):
File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 2526 in delta
File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 2535 in apply
File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 2493 in add
File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 390 in radd
File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 2535 in apply
File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 2493 in add
File "C:\Users\geoffroy.destaintot\Miniconda3\envs\pd-0.18\lib\site-packages\pandas\tseries\offsets.py", line 390 in radd
...

Expected Output

Satisfactory behaviour when using python timedeltas:

In:
import datetime as dt
import pandas as pd
from pandas.tseries.frequencies import to_offset

d = pd.Timestamp("2000/1/1")
d + dt.timedelta(days=1)*100**25
Out:

=> python error

Traceback (most recent call last):
File "C:/Users/geoffroy.destaintot/Documents/Local/Informatique/Projets/2016-08-django-debug/to_offset_bug.py", line 11, in
d + dt.timedelta(days=1)100*25
OverflowError: Python int too large to convert to C long

output of pd.show_versions()

(same behaviour with pandas 0.17.1, 0.16.2, 0.15.2)

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 25.1.6
Cython: None
numpy: 1.11.1
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Aug 24, 2016

thought we had an issue for this....

its an wraparound thing I think.

PR's are welcome.

@jreback jreback added this to the Next Major Release milestone Aug 24, 2016
@bhaprayan
Copy link

Any pointers on how to fix this?

@jreback
Copy link
Contributor

jreback commented Aug 25, 2016

step thru the code - this hits cython at some point (for the add) then again for the construction of a new Timestamp - think it's crashing there

@bhaprayan
Copy link

bhaprayan commented Aug 26, 2016

I generated the stack trace, and stepped through the code. I've isolated the problem to the subset of the trace I've attached.
It crashes at the point where it's trying to multiply "self.n" and "self._inc", within the Delta function of the Tick class. Any suggestions on fixing this?

`> /home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(393)radd()
-> def radd(self, other):
(Pdb) s

/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(394)radd()
-> return self.add(other)
(Pdb) s
--Call--
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2698)add()
-> def add(self, other):
(Pdb) s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2699)add()
-> if isinstance(other, Tick):
(Pdb) s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2704)add()
-> elif isinstance(other, ABCPeriod):
(Pdb) s
--Call--
/home/bhaprayan/Workspace/pandas/pandas/types/generic.py(7)_check()
-> @classmethod
(Pdb) s
/home/bhaprayan/Workspace/pandas/pandas/types/generic.py(9)_check()
-> return getattr(inst, attr, '_typ') in comp
(Pdb) s
--Return--
/home/bhaprayan/Workspace/pandas/pandas/types/generic.py(9)_check()->False
-> return getattr(inst, attr, '_typ') in comp
(Pdb) s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2706)add()
-> try:
(Pdb) s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2707)add()
-> return self.apply(other)
(Pdb) s
--Call--
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2746)apply()
-> def apply(self, other):
(Pdb) s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2748)apply()
-> if isinstance(other, (datetime, np.datetime64, date)):
(Pdb) s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2749)apply()
-> return as_timestamp(other) + self
(Pdb) s
--Call--
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(35)as_timestamp()
-> def as_timestamp(obj):
(Pdb) s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(36)as_timestamp()
-> if isinstance(obj, Timestamp):
(Pdb) s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(37)as_timestamp()
-> return obj
(Pdb) s
--Return--
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(37)as_timestamp()->Timestam...0:00:00')
-> return obj
(Pdb) s
--Call--
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2738)delta()
-> @Property
(Pdb) s
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2740)delta()
-> return self.n * self._inc
(Pdb) s
OverflowError: 'Python int too large to convert to C long'
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2740)delta()
-> return self.n * self._inc
(Pdb) s
--Return--
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2740)delta()->None
-> return self.n * self._inc
(Pdb) s
--Call--
/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(393)radd()
-> def radd(self, other):
(Pdb)
`

@jreback
Copy link
Contributor

jreback commented Aug 26, 2016

so I think that multiplcation needs a guard on overflow

In [2]: np.iinfo(np.int64).max
Out[2]: 9223372036854775807

In [3]: np.int64(1000000)*np.int64(86400*1e9)
/Users/jreback/miniconda/bin/ipython:1: RuntimeWarning: overflow encountered in long_scalars
  #!/bin/bash /Users/jreback/miniconda/bin/python.app
Out[3]: -5833720368547758080

@bhaprayan
Copy link

bhaprayan commented Aug 27, 2016

First, I set a guard on the multiplication overflow. However it's still stuck in a recursive loop, where after catching the OverflowError, it still calls radd.

`ipdb> s

/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2741)delta()
2739 def delta(self):
2740 try:
-> 2741 self.n * self._inc
2742 except OverflowError:
2743 raise

ipdb> s
OverflowError: 'Python int too large to convert to C long'

/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2741)delta()
2739 def delta(self):
2740 try:
-> 2741 self.n * self._inc
2742 except OverflowError:
2743 raise

ipdb> s

/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2742)delta()
2740 try:
2741 self.n * self._inc
-> 2742 except OverflowError:
2743 raise
2744

ipdb> s

/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2743)delta()
2741 self.n * self._inc
2742 except OverflowError:
-> 2743 raise
2744
2745 @Property

ipdb> s
--Return--
None

/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(2743)delta()
2741 self.n * self._inc
2742 except OverflowError:
-> 2743 raise
2744
2745 @Property

ipdb> s
--Call--

/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(393)radd()
391 return NotImplemented
392
--> 393 def radd(self, other):
394 return self.add(other)
395

ipdb> s

/home/bhaprayan/Workspace/pandas/pandas/tseries/offsets.py(394)radd()
392
393 def radd(self, other):
--> 394 return self.add(other)
395
396 def sub(self, other):
`

@guygoldberg
Copy link
Contributor

Looks like this issue was already solved, by running the reproduction scenario now I get a clear exception:
OverflowError: the add operation between <100000000000000000000000000000000000000000000000000 * Days> and 2000-01-01 00:00:00 will overflow

@jreback
Copy link
Contributor

jreback commented May 26, 2017

great

do u want to do a PR with some tests ?

@dsm054
Copy link
Contributor

dsm054 commented Jul 2, 2017

I put together a quick smoke test, and indeed it looks like things are generating exceptions like they should.

But two offsets, the FY5253Quarter and DateOffset cases, both take forever to fail, ~20s in one case, ~10s in the other, so something's different about them (I haven't given even a cursory glance).

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Feb 19, 2018
@jreback
Copy link
Contributor

jreback commented Feb 19, 2018

this is already fixed in master if someone would like to add tests in a PR

@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@jbrockmendel jbrockmendel added the Needs Tests Unit test(s) needed to prevent regressions label Nov 16, 2018
@gfyoung gfyoung removed the Bug label Nov 18, 2018
@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Nov 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants