pandas.DataFrame.sum() skipna behavior does not match documentation #15542
Labels
Duplicate Report
Duplicate issue or pull request
Missing-data
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Numeric Operations
Arithmetic, Comparison, and Logical operations
Code Sample, a copy-pastable example if possible
`
import pandas as pd
df = pd.DataFrame([[0,0],[1,1],[np.NaN,0],[np.NaN,1],[np.NaN,np.NaN]], columns = ['a','b'])
df
Out[3]:
a b
0 0.0 0.0
1 1.0 1.0
2 NaN 0.0
3 NaN 1.0
4 NaN NaN
df.sum(axis=1, skipna=True)
Out[4]:
0 0.0
1 2.0
2 0.0
3 1.0
4 0.0
dtype: float64
`
Problem description
The documentation for pandas.DataFrame.sum indicates that:
However, it appears that the actual behavior under the skipna option within this particular environment is for the summation routine to evaluate full rows of na values to 0 rather than the documented value of NA.
Currently this behavior only occurs on within my local windows environment, and NOT within the linux environment my organization's grid runs on. I have provided output for pd.versions for both environments below to help localize this issue.
Expected Output
df.sum(axis=1, skipna=True)
Out[6]:
0 0.0
1 2.0
2 0.0
3 1.0
4 NaN
dtype: float64
Output of
pd.show_versions()
Windows version in which issue is present
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US
LOCALE: None.None
pandas: 0.19.2
nose: 1.3.7
pip: 8.1.1
setuptools: 20.3
Cython: 0.23.4
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.1.2
sphinx: 1.3.5
patsy: 0.4.0
dateutil: 2.5.1
pytz: 2016.2
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.6.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None
Linux version in which issue is NOT present
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-642.15.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.19.2
nose: 1.3.7
pip: 8.1.2
setuptools: 27.2.0
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.4.1
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.2
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: