Skip to content

Summing with missing data appears to be broken in 0.16.2 when all elements are NA #10688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
numpand opened this issue Jul 28, 2015 · 3 comments
Closed
Labels
Build Library building on various platforms Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Usage Question

Comments

@numpand
Copy link

numpand commented Jul 28, 2015

Summing with missing data appear to be broken in 0.16.2. According to http://pandas.pydata.org/pandas-docs/version/0.16.2/missing_data.html#calculations-with-missing-data
When summing data, NA (missing) values will be treated as zero
=>>> If the data are all NA, the result will be NA <== This returns zero in 0.16.2 (it returns NA in 0.15.2).
Old behavior as specified in documentation:
pandas0 15 2feature
New behavior which is different from documenation:
pandas0 16 2bug

@jreback
Copy link
Contributor

jreback commented Jul 28, 2015

You must have a confused environment. Both 0.15.2 & 0.16.2 have the same results. Note you may be using a different numpy.

FYI, pls don't paste images, they are impossible to copy-paste.

In [1]: df = DataFrame([[np.nan,np.nan],[np.nan,1],[2,3]],columns=list('AB'))

In [2]: df
Out[2]: 
    A   B
0 NaN NaN
1 NaN   1
2   2   3

In [3]: df.sum(axis=1)
Out[3]: 
0   NaN
1     1
2     5
dtype: float64

In [4]: np.__version__
Out[4]: '1.9.2'

In [5]: pd.__version__
Out[5]: '0.15.2-319-g434828a'
In [1]: df = DataFrame([[np.nan,np.nan],[np.nan,1],[2,3]],columns=list('AB'))

In [2]: df.sum(axis=1)
Out[2]: 
0   NaN
1     1
2     5
dtype: float64

In [3]: pd.__version__
Out[3]: '0.16.2'

@jreback jreback added Build Library building on various platforms Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Usage Question labels Jul 28, 2015
@jreback jreback closed this as completed Jul 28, 2015
@numpand
Copy link
Author

numpand commented Jul 29, 2015

Sorry about the images. I just checked the versions and we are using numpy 1.9.2 with pandas 0.16.2 and yet still getting a zero when summing all NAs. Any idea why?

Python 3.3.5 |Continuum Analytics, Inc.| (default, Jan 9 2015, 10:46:22) [MSC v.1600 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.

IPython 3.2.1 -- An enhanced Interactive Python.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.3.5.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.16.2
nose: 1.3.4
Cython: 0.22.1
numpy: 1.9.2
scipy: 0.15.1
statsmodels: 0.6.1
IPython: 3.2.1
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: 1.0.0
tables: 3.1.1
numexpr: 2.3.1
matplotlib: 1.4.3
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.0.6
pymysql: None
psycopg2: 2.5.5 (dt dec pq3 ext)

In [4]: df = pd.DataFrame([[np.nan,np.nan],[np.nan,1],[2,3]],columns=list('AB'))

In [5]: df.sum(axis=1)
Out[5]:
0 0
1 1
2 5
dtype: float64

@shoyer
Copy link
Member

shoyer commented Jul 29, 2015

This is a dup of #9422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Usage Question
Projects
None yet
Development

No branches or pull requests

3 participants