Skip to content

DataFrame.rolling causes Kernel died, restart #22590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pasinu29 opened this issue Sep 4, 2018 · 18 comments · Fixed by #29392
Closed

DataFrame.rolling causes Kernel died, restart #22590

pasinu29 opened this issue Sep 4, 2018 · 18 comments · Fixed by #29392
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Window rolling, ewma, expanding Windows Windows OS
Milestone

Comments

@pasinu29
Copy link

pasinu29 commented Sep 4, 2018

Description

I have a dataframe with a Datetime index, and pandas 0.23.4 causes a Kernel died, restarting notification when using DF.rolling with a specified window as a string (i.e. '2D'). This worked in previous versions of pandas (at least 0.21 and 0.20). Any advice on this bug/issue would be great!

Code example

from datetime import datetime,timedelta

import pandas as pd, numpy as np
date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(365), freq='D')

np.random.seed(seed=421)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'DateCol': days, 'metric': data})

TIME_WINDOW='21D'

#Minimum amount of values
MIN_VALUES=2

#Set the index of the DF as a DateTime
df.set_index('DateCol',inplace=True)

test=df.rolling(window=TIME_WINDOW,min_periods=MIN_VALUES,closed='left')['metric'].agg('max')

Result:

Kernel died, restarting

Pandas 0.23.4 documentation

The pandas 0.23.4 documentation shows the window can be a string:
df.rolling('2s').sum()

@WillAyd
Copy link
Member

WillAyd commented Sep 4, 2018

Unfortunately this is lacking some crucial information to help troubleshoot. Please refer to the pandas contributing guide on bug reports:

https://pandas.pydata.org/pandas-docs/stable/contributing.html#bug-reports-and-enhancement-requests

Specifically, we need a minimally reproducible example which highlights the issue

@WillAyd WillAyd added the Needs Info Clarification about behavior needed to assess issue label Sep 4, 2018
@pasinu29
Copy link
Author

pasinu29 commented Sep 4, 2018

@WillAyd , the example has been updated with a reproducible example, that caused the issue.

@WillAyd
Copy link
Member

WillAyd commented Sep 4, 2018

Thanks for the code update. In the future please also use syntax highlighting (I've edited for you for now).

This ran fine for me on master. Can you try there and if that doesn't work please update your original post with the output of pd.show_versions()

@WillAyd WillAyd added the Window rolling, ewma, expanding label Sep 4, 2018
@TomAugspurger
Copy link
Contributor

That works for me as well. Could you post pd.show_versions and how you installed pandas (pip, conda, source)?

@pasinu29
Copy link
Author

pasinu29 commented Sep 5, 2018

Installed pandas through conda.

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.23.4
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

cc @chris-b1 any chance this is related to the rolling changes?

@pasinu29 does this work for you with pandas 0.23.2 or 0.23.3?

@chris-b1
Copy link
Contributor

chris-b1 commented Sep 5, 2018

This does reproduce for me on Windows - I'll take a look. Different path than that other rolling bug, but otherwise suspiciously similar.

@WillAyd WillAyd added Windows Windows OS and removed Needs Info Clarification about behavior needed to assess issue labels Sep 5, 2018
@pasinu29
Copy link
Author

pasinu29 commented Sep 5, 2018

@WillAyd , the same Kernel issue occurs using pandas 0.23.2 and 0.23.3.

The code runs successfully using pandas 0.20.3.

@chris-b1
Copy link
Contributor

chris-b1 commented Sep 5, 2018

This is (somehow) fixed on master for Windows. I built both it add 0.23.4 from source with the same windows toolchain and works/doesn't. Will at least want a confirming test

@WillAyd WillAyd added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Sep 5, 2018
@wesleyParriott
Copy link

Do you mean it doesn't work? Because I built from source and I can't confirm that the issue is fixed. Here's my versions:

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.23.4
pytest: 3.7.4
pip: 18.0
setuptools: 20.10.1
Cython: 0.28.5
numpy: 1.15.1
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

but instead of getting "Kernel died, restarting." I get a segmentation fault

@topper-123
Copy link
Contributor

I tried this on windows.

Using pandas 0.23.1 the kernel dies and I'm thrown out to the command line.

Using Pandas master everything functions as expected for me. Can anyone affirm, that this is not present in master?

Details on my 0.23.1 run:

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 32 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.23.1
pytest: 3.4.0
pip: 18.0
setuptools: 38.4.1
Cython: 0.26.1
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@JayDSingh
Copy link

JayDSingh commented Sep 16, 2018

I'm not sure I'm doing this exactly right but I tried it on both 0.23.4 and built from source on master and it worked on both runs. These are my versions:

For 0.23.4

INSTALLED VERSIONS

commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 40.2.0
Cython: None
numpy: 1.15.1
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.2.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.3
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

and from source:

INSTALLED VERSIONS

commit: 1c500fb
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0.dev0+585.g1c500fb7b
pytest: None
pip: 18.0
setuptools: 40.2.0
Cython: None
numpy: 1.15.1
scipy: None
pyarrow: None
xarray: None
IPython: 6.5.0
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The only difference I notice is that from source test has rows starting with

DateCol
2018-09-15 22:02:51.089709     NaN
2018-09-16 22:02:51.089709     NaN
2018-09-17 22:02:51.089709    88.0
2018-09-18 22:02:51.089709    88.0

whereas in 0.23.4 test starts off with

DateCol
2018-09-15 22:02:26.212840     NaN
2018-09-16 22:02:26.212840    54.0
2018-09-17 22:02:26.212840    54.0
2018-09-18 22:02:26.212840    82.0

Not sure exactly why there's a difference but it could be related to the issue? Either way both run fine without crashing on mine.

@pasinu29
Copy link
Author

pasinu29 commented Oct 2, 2018

Was there any consensus on this? I'm not sure how to run it on master (not fully knowledgeable on git yet). I've only run it from conda.

@pasinu29
Copy link
Author

pasinu29 commented Dec 6, 2018

Tried this yet again and it broke (kerned died) when using pandas 0.23.4 on conda.

@jpedrick
Copy link

jpedrick commented Dec 6, 2018

I'm running into this issue as well. Does anyone know a workaround?

@Dear-Mar-garet
Copy link

Tried this yet again and it broke (kerned died) when using pandas 0.23.4 on conda.

I had the same problem as you. I tried to use the pandas.core.window.Rolling.apply function as follows.
test=df.rolling(window=TIME_WINDOW,min_periods=MIN_VALUES,closed='left')['metric'].apply(min)

and it worked well.

@pasinu29
Copy link
Author

Tried this yet again and it broke (kerned died) when using pandas 0.23.4 on conda.

I had the same problem as you. I tried to use the pandas.core.window.Rolling.apply function as follows.
test=df.rolling(window=TIME_WINDOW,min_periods=MIN_VALUES,closed='left')['metric'].apply(min)

and it worked well.

Thanks for the work around! Though it is a little slower than the .agg('func') this is working for me as well.

@Dear-Mar-garet
Copy link

Tried this yet again and it broke (kerned died) when using pandas 0.23.4 on conda.

I had the same problem as you. I tried to use the pandas.core.window.Rolling.apply function as follows.
test=df.rolling(window=TIME_WINDOW,min_periods=MIN_VALUES,closed='left')['metric'].apply(min)
and it worked well.

Thanks for the work around! Though it is a little slower than the .agg('func') this is working for me as well.

: )

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 4, 2019
@gfyoung gfyoung added this to the 1.0 milestone Nov 4, 2019
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Window rolling, ewma, expanding Windows Windows OS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants