Skip to content

BUG: Fix nansum overflow on Windows with bottleneck #15507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

JaviSorribes
Copy link

@JaviSorribes JaviSorribes commented Feb 25, 2017

Bottleneck used in nansum produces overflow error on Windows. Temporary fix that could be undone once the error in bottleneck is fixed and released, to avoid platform specificity.

@codecov-io
Copy link

codecov-io commented Feb 26, 2017

Codecov Report

Merging #15507 into master will decrease coverage by -0.07%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #15507      +/-   ##
==========================================
- Coverage    91.1%   91.03%   -0.07%     
==========================================
  Files         136      136              
  Lines       49102    49103       +1     
==========================================
- Hits        44736    44703      -33     
- Misses       4366     4400      +34
Impacted Files Coverage Δ
pandas/core/nanops.py 97.96% <100%> (-0.21%)
pandas/io/gbq.py 40% <0%> (-46.67%)
pandas/util/decorators.py 67.92% <0%> (-1.31%)
pandas/io/pytables.py 93.05% <0%> (-0.86%)
pandas/tools/merge.py 91.78% <0%> (-0.35%)
pandas/core/frame.py 97.74% <0%> (-0.1%)
pandas/core/series.py 94.93% <0%> (-0.04%)
pandas/core/algorithms.py 94.48% <0%> (ø)
pandas/compat/pickle_compat.py 68.29% <0%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23889d3...7b95a88. Read the comment docs.

@cpcloud cpcloud self-assigned this Feb 26, 2017
@cpcloud cpcloud requested a review from jreback February 26, 2017 04:00
@cpcloud cpcloud added Bug Numeric Operations Arithmetic, Comparison, and Logical operations Windows Windows OS labels Feb 26, 2017
@cpcloud cpcloud added this to the 0.20.0 milestone Feb 26, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest issue here is this semantically changes things. IOW, now since we are always using pandas nansum (rather than bottleneck nansum if installed), we will always yield Series([np.nan]).sum() -> np.nan (rather than 0 if bottleneck is installed). So I think we actually have to change this for all platforms or then this becomes even more confusing.

@@ -130,6 +131,8 @@ def _bn_ok_dtype(dt, name):
if name == 'nansum':
if dt.itemsize < 8:
return False
elif os.name == 'nt': # bottleneck has overflow issue in Windows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use pandas.compat.is_platform_windows()

@@ -327,6 +327,12 @@ def test_nansum(self):
self.check_funs(nanops.nansum, np.sum, allow_str=False,
allow_date=False, allow_tdelta=True, check_dtype=False)

def test_nansum_overflow(self):
s = Series([2**31])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs testing both with and w/o bottleneck (to yield the same results), see other tests for how to do this.

@jreback
Copy link
Contributor

jreback commented Feb 26, 2017

@jorisvandenbossche @shoyer

w.r.t. #9422 I think we need to simply change this (in effect what this PR does), but for all platforms (and not just windows).

@cpcloud cpcloud changed the title Fix nansum overflow on Windows with bottleneck BUG: Fix nansum overflow on Windows with bottleneck Feb 27, 2017
@cpcloud
Copy link
Member

cpcloud commented Feb 27, 2017

@JaviSorribes Looking at @jreback's comments, it looks like we should just disable nansum bottleneck calls altogether.

@JaviSorribes
Copy link
Author

So I understand @jreback and @cpcloud that we should not use bottleneck at all for nansum? Checkout the image attached for what I think this would look like.

Also, what about all the other nan operations? Should we consider possible overflow in any of them?

image

@jreback
Copy link
Contributor

jreback commented Mar 1, 2017

xref #15542

@jorisvandenbossche
Copy link
Member

We indeed just need to take a decision on #9422, as we have left this lingering too long.
But, I am not necessarily convinced that we should change everything to the native pandas way (returning NaN), instead of changing pandas to return 0 to align with numpy/bottleneck (and also R, matlab, ..).
I don't think there was really agreement on this in #9422 (several have argued for both positions), but I also don't know how to best proceed on taking a decision.

@jreback jreback modified the milestone: 0.20.0 Mar 23, 2017
@jreback
Copy link
Contributor

jreback commented May 7, 2017

The fixes are related to fixing #9422 are a bit bigger in scope than this. That said If you want to rebase / add some additional tests could take this as a precurser.

@jreback
Copy link
Contributor

jreback commented Aug 20, 2017

closing this. as #9422 needs resolution.

jreback added a commit to jreback/pandas that referenced this pull request Sep 22, 2017
jreback added a commit to jreback/pandas that referenced this pull request Sep 23, 2017
jreback added a commit to jreback/pandas that referenced this pull request Sep 25, 2017
jreback added a commit to jreback/pandas that referenced this pull request Sep 28, 2017
jreback added a commit to jreback/pandas that referenced this pull request Oct 2, 2017
jreback added a commit to jreback/pandas that referenced this pull request Oct 6, 2017
jreback added a commit to jreback/pandas that referenced this pull request Oct 6, 2017
jreback added a commit to jreback/pandas that referenced this pull request Oct 9, 2017
jreback added a commit to jreback/pandas that referenced this pull request Oct 10, 2017
jreback added a commit that referenced this pull request Oct 10, 2017
ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 16, 2017
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations Windows Windows OS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: int64 overflow/wrap around with sum()
5 participants