Skip to content

Stacking for geom_area doesn't properly handle missing entries #280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wch opened this issue Dec 7, 2011 · 4 comments
Closed

Stacking for geom_area doesn't properly handle missing entries #280

wch opened this issue Dec 7, 2011 · 4 comments
Milestone

Comments

@wch
Copy link
Member

wch commented Dec 7, 2011

With a a stacked area graph, it requires an entry at every x for every group. If for a given x value, one group doesn't have an entry in the data frame, then it will behave as though the y value of that group at that x is zero.

The pictures will illustrate better:

dat <- data.frame(
        g=rep(LETTERS[1:3], each=4),
        x=rep(1:4, 3),
        y=rep(3:14))

# Remove row with g=B, x=3 
dat <- dat[-7,] 
dat

# Lines all look straight
ggplot(dat, aes(x=x, y=y, colour=g)) + geom_line()

# With a stacked area graph, there's a dip at x=3 
ggplot(dat, aes(x=x, y=y, fill=g)) + geom_area()

Test code:

test_that("Stacked area graph interpolates missing values", {
  dat <- data.frame(
           g=rep(LETTERS[1:3], each=4),
           x=rep(1:4, 3),
           y=rep(3:14))

  # Remove row with g=B, x=3 
  dat <- dat[-7,] 

  p <- ggplot_build(ggplot(dat, aes(x=x, y=y, fill=g)) + geom_area())

  topgroup_y <- with(p$data[[1]], y[x==3 & group==3] )
  expect_equal(topgroup_y, 27)  
})

I think fixing this one would require doing some interpolation. Perhaps solving this one is better left to the large changes to stacking code in the future?

@kohske
Copy link
Collaborator

kohske commented Dec 7, 2011

Just a note, although this is useful in some cases, I don't think this kind of automatic interpolation is good idea.
The purpose of visualization is to visually inspect how the data is.
But the automatic interpolation will make users miss the missing values.
Furthermore, there is no reason to apply liner interpolation. Why not smoothing, why not other filtering?
So, in my view, interpolation should be done by users' hand.
Or, at least, the explicit way, such as stat_interpolate, should be provided. But maybe this is beyond the scope of "plotting."
Another way is to simply induce an error or a warning when missing values are detected.

@hadley
Copy link
Member

hadley commented Dec 7, 2011

Yeah, I'm totally with @KOSHKE on this one. It gets even more complicated if you consider longitudinal data where possibly none of the time points align.

But I think it's worth having some tool that will do this, just not automatically. Something to consider for 1.0

@wch
Copy link
Member Author

wch commented Dec 7, 2011

That makes sense. I think it would be a good idea to have an informative warning message so that people know how to deal with the issue if they encounter it.

@hadley
Copy link
Member

hadley commented Feb 24, 2014

This sounds like a great feature, but unfortunately we don't currently have the development bandwidth to support it. If you'd like to submit a pull request that implements this feature, please follow the instructions in the development vignette.

@hadley hadley closed this as completed Feb 24, 2014
@lock lock bot locked as resolved and limited conversation to collaborators Jun 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants