Skip to content

stat_summary_bin bug? #1739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eipi10 opened this issue Sep 4, 2016 · 4 comments
Closed

stat_summary_bin bug? #1739

eipi10 opened this issue Sep 4, 2016 · 4 comments

Comments

@eipi10
Copy link

eipi10 commented Sep 4, 2016

In the plot below, you can see that an "extra" bin value is plotted to the right of the data values.

library(ggplot2)

# simulate an example of linear data 
set.seed(1)
N <- 10^4
x <- runif(N)
y <- x + rnorm(N)
dt <- data.frame(x=x, y=y)

ggplot(dt, aes(x, y)) + 
  geom_point(alpha = 0.1, size = 0.01) +
  stat_summary_bin(fun.y=mean, bins=10, size=5, geom="point", colour="red") 

It seems like there could be a bug in stat_summary_bin, because, as shown in the code below, two y values (from the two rows with the highest x-values) are excluded from the binning and end up in an NA bin, which is the one plotted to the right of the data bins in the plot above. I would have expected that by default ggplot would include all the data values in the binning procedure.

p1=ggplot(dt, aes(x, y)) + 
  geom_point(alpha = 0.1, size = 0.01) +
  stat_summary_bin(fun.y=mean, bins=10, size=5, geom="point") +
  stat_summary_bin(fun.y=length, bins=10, size=5, geom="point")

p1b = ggplot_build(p1)
p1b$data[[2]][9:11, c(1,3,5,6)]
p1b$data[[3]][9:11, c(1,3,5,6)]

The issue raised above is based on this SO question.

@thomasp85
Copy link
Member

Possible duplicate of #1651

@hadley
Copy link
Member

hadley commented Sep 15, 2016

Simpler reprex:

library(ggplot2)

x <- seq(0, 1, length = 1e2)
y <- x + rnorm(length(x))
dt <- data.frame(x, y)

# NOT OK
ggplot(dt, aes(x, y)) + 
  geom_point(colour = "grey80") +
  stat_summary_bin(fun.y=mean, bins=10, geom = "point", colour = "red") 

# OK
ggplot(dt, aes(x)) + 
  geom_histogram(bins = 10, boundary = 0)

@hadley
Copy link
Member

hadley commented Sep 15, 2016

That's not an NA bin, it's the bin [1.0, 1.1). This just an unfortunately consequence of the fact that you can't exactly represent 1 / 10 = 0.1 as a floating point number.

@hadley hadley closed this as completed Sep 15, 2016
@eipi10
Copy link
Author

eipi10 commented Sep 15, 2016

Thanks Hadley.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants