Skip to content

pandas.DataFrame.sum() skipna behavior does not match documentation #15542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jglicoes opened this issue Mar 1, 2017 · 1 comment
Closed
Labels
Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@jglicoes
Copy link

jglicoes commented Mar 1, 2017

Code Sample, a copy-pastable example if possible

`
import pandas as pd

df = pd.DataFrame([[0,0],[1,1],[np.NaN,0],[np.NaN,1],[np.NaN,np.NaN]], columns = ['a','b'])

df
Out[3]:
a b
0 0.0 0.0
1 1.0 1.0
2 NaN 0.0
3 NaN 1.0
4 NaN NaN

df.sum(axis=1, skipna=True)
Out[4]:
0 0.0
1 2.0
2 0.0
3 1.0
4 0.0
dtype: float64
`

Problem description

The documentation for pandas.DataFrame.sum indicates that:

skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA

However, it appears that the actual behavior under the skipna option within this particular environment is for the summation routine to evaluate full rows of na values to 0 rather than the documented value of NA.

Currently this behavior only occurs on within my local windows environment, and NOT within the linux environment my organization's grid runs on. I have provided output for pd.versions for both environments below to help localize this issue.

Expected Output

df.sum(axis=1, skipna=True)
Out[6]:
0 0.0
1 2.0
2 0.0
3 1.0
4 NaN
dtype: float64

Output of pd.show_versions()

Windows version in which issue is present

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

Linux version in which issue is NOT present

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-642.15.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.4.1
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.2
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: 0.2.1

@jreback jreback added Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 1, 2017
@jreback jreback added this to the No action milestone Mar 1, 2017
@jreback
Copy link
Contributor

jreback commented Mar 1, 2017

this a duplicate of #9422

you have bottleneck installed on 1 env and not the other. This is unfortunate the pandas behavior is correct and numpy/bottleneck are wrong (but they matched in bottleneck < 1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

2 participants