Skip to content

geom_bar inconsistently handling date values #2047

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nteetor opened this issue Feb 15, 2017 · 13 comments · Fixed by #4416
Closed

geom_bar inconsistently handling date values #2047

nteetor opened this issue Feb 15, 2017 · 13 comments · Fixed by #4416
Labels
bug an unexpected problem or unintended behavior layers 📈
Milestone

Comments

@nteetor
Copy link

nteetor commented Feb 15, 2017

Description

There are unexpected plot results when specifying the fill aesthetic in geom_bar when the x aesthetic is a Date or POSIXct value. Below I have listed examples using Date and POSIXct objects, respectively. Similar data represented with these two classes results in rather different plots, see below.

Date Examples

The following examples use Date objects produced with make_date().

1 month with 2 fill values

In this example January fill values are TRUE and FALSE, February and March fill values are only FALSE.

library(ggplot2)
library(lubridate)

d1 <- data.frame(
  dates = rep(make_date(year = 2017, month = 1:3), each = 2),
  highlight = c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE)
)

# this unexpectedly changes the width of one of January's bars
ggplot(d1, aes(x = dates)) +
  geom_bar(aes(fill = highlight))
## Warning: position_stack requires non-overlapping x intervals

unnamed-chunk-1-1

1 month with distinct fill value

In this example January fill values are only TRUE, both February and March are only FALSE.

library(ggplot2)
library(lubridate)

d2 <- data.frame(
  dates = make_date(year = 2017, month = 1:3),
  highlight = c(TRUE, FALSE, FALSE)
)

# only one bar for January, but as above it is thin
ggplot(d2, aes(x = dates)) +
  geom_bar(aes(fill = highlight))

unnamed-chunk-2-1

3+ fill values

This example introduces a third fill value to the highlight column.

library(ggplot2)
library(lubridate)

d3 <- data.frame(
  dates = make_date(year = 2017, month = 1:3),
  highlight = c(1, 2, 3)
)

# all bars are now equally thin, additional breaks added to x-axis
ggplot(d3, aes(x = dates)) +
  geom_bar(aes(fill = factor(highlight)))

unnamed-chunk-3-1

POSIXct Examples

The following examples use POSIXct objects produced with make_datetime().

1 month with distinct fill value

In this example, when January fill is only FALSE, February and March are only TRUE, the January bar does not show. The bar may be too thin to see.

library(ggplot2)
library(lubridate)

d4 <- data.frame(
  datetimes = make_datetime(year = 2017, month = 1:3),
  highlight = c(TRUE, FALSE, FALSE)
)

# January bar is no longer thin, instead missing 
ggplot(d4, aes(x = datetimes)) +
  geom_bar(aes(fill = highlight))

unnamed-chunk-4-1

3+ fill values

Similar the 3+ fill value example above, however in this example the bars are not equally thin, they are all missing.

library(ggplot2)
library(lubridate)

d5 <- data.frame(
  datetimes = make_datetime(year = 2017, month = 1:3),
  highlight = c(1, 2, 3)
)

# no bars
ggplot(d5, aes(x = datetimes)) +
  geom_bar(aes(fill = factor(highlight)))

unnamed-chunk-5-1

Conclusion

The geom_bar fill aesthetic is not properly handling Date and POSIXct objects. I am not weighing in yet on whether bars ought to be thin or wide, rather I am hoping to iron out geom_bar so Date and POSIXct values are handled consistently. I hope I have not misunderstood how geom_bar is intended to handle date values.

I will look under the hood to try and identify the problem. For now I am not sure of a work around.

@hadley

This comment has been minimized.

@hadley hadley added bug an unexpected problem or unintended behavior layers 📈 labels Feb 15, 2017
@nteetor
Copy link
Author

nteetor commented Feb 17, 2017

I came across a case which touches upon the first plot's inconsistent gaps you pointed out.

library(ggplot2)
library(lubridate)

d6 <- data.frame(
  dates = rep(make_date(year = 2017, month = 1:3), each = 2),
  highlight = c(TRUE, FALSE, TRUE, FALSE, FALSE, FALSE)
)

ggplot(d6, aes(x = dates)) +
  geom_bar(aes(fill = highlight))

unnamed-chunk-4-1

I attempted to calculate the width ahead of time using resolution() and the full dates column. The inconsistent gaps issue cropped up again.

# using d6 data frame from above
ggplot(d6, aes(x = dates)) +
  geom_bar(aes(fill = highlight), width = resolution(as.numeric(d6$dates)))

unnamed-chunk-4-2

@ghost

This comment has been minimized.

@hadley
Copy link
Member

hadley commented May 9, 2018

Minimal reprex:

library(ggplot2)

df <- data.frame(
  x = c(0, 0, 2, 1), 
  fill = c(TRUE, TRUE, TRUE, FALSE)
)

ggplot(df, aes(x, fill = fill)) + geom_bar()
#> Warning: position_stack requires non-overlapping x intervals

ggplot(df, aes(x, fill = fill)) + geom_bar(width = 0.9)

I think the root cause is that stat_count() is computing the width based on the resolution of an individual group rather than the full dataset.

@thibautjombart
Copy link

For what it is worth, this bugs results in issues with the RECON package incidence:

library(incidence)

set.seed(1)

dates <- as.Date("2018-01-01") + sample(1:20, 100, replace = TRUE)
dates_posix <- as.POSIXct(dates)

plot(incidence(dates))

plot(incidence(dates_posix))

@hadley
Copy link
Member

hadley commented May 22, 2018

To be clear, are you stating that this is not a problem with the released version of ggplot2?

@peterfine
Copy link

This problem @hadley, as per you example in #2047 (comment) and per my own difficulty in plotting similar data, does still exist in ggplot2 3.2.1

@avallecam
Copy link

This issue have return in the new released version of ggplot 3.3.0 available in CRAN

image

For what it is worth, this bugs results in issues with the RECON package incidence:

library(incidence)

set.seed(1)

dates <- as.Date("2018-01-01") + sample(1:20, 100, replace = TRUE)
dates_posix <- as.POSIXct(dates)

plot(incidence(dates))

@hadley
Copy link
Member

hadley commented Mar 16, 2020

@avallecam that's a different problem. Please open a new issue with a reprex.

@thibautjombart
Copy link

I think it is indeed another issue. I have posted a reprex already in this issue.

@stragu
Copy link
Contributor

stragu commented Sep 26, 2020

The report above, which Kara closed, is a duplicate.

So, moving forward: how could the compute_group element of the stat_count ggproto take the whole original vector rather than just the values of the group currently handled, in order to compute the bar's width?

width <- width %||% (resolution(x) * 0.9)

@thomasp85
Copy link
Member

The basic issue here is that the group aesthetic is derived from fill meaning that the resolution of the second group is computed to be much higher... Setting group fixes this

library(ggplot2)

df <- data.frame(
  x = c(0, 0, 2, 1), 
  fill = c(TRUE, TRUE, TRUE, FALSE)
)

ggplot(df, aes(x, fill = fill, group = x)) + geom_bar()

Created on 2021-04-13 by the reprex package (v2.0.0)

I think we can safely move the resolution(x) call into setup_params to fix this, so that default width is group-independent, but maybe someone is relying on this behaviour?

@thomasp85
Copy link
Member

hmm, actually... a question then arrises on whether the width should be calculated per-panel or globally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior layers 📈
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants