Skip to content

ggplot throws an error when a data is zero row and the lengths of aesthetics are not zero #2850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hisakatha opened this issue Aug 23, 2018 · 8 comments
Labels
bug an unexpected problem or unintended behavior internals 🔎
Milestone

Comments

@hisakatha
Copy link

If a data is zero row and the lengths of aesthetics are not zero, ggplot throws an error with an wrong length of the data as follows.

Error: Aesthetics must be either length 1 or the same as the data (1): x, y, colour

I think the number in the parentheses indicates the length of the data, but it is wrong, since the length is zero.

In addition, this behavior doesn't get along with constant (length-one) aesthetics, while they can be combined with non-zero-row data. I hope ggplot tolerates combination of zero-row data and constant aesthetics.

Example (using R 3.5.1 and ggplot2 3.0.0):

library(ggplot2)
d1 <- data.frame(xval = rep(1:5, 4), yval = 1:20)

# The followings are OK
ggplot(d1, aes(xval, yval))
ggplot(d1, aes(xval, yval, colour = "Type 1"))
nrow(d1[0,])
#> [1] 0
ggplot(d1[0,], aes(xval, yval))

# This causes an error
ggplot(d1[0,], aes(xval, yval, colour = "Type 1"))
#> Error: Aesthetics must be either length 1 or the same as the data (1): x, y, colour
@batpigandme
Copy link
Contributor

With reprex:

library(ggplot2)
d1 <- data.frame(xval = rep(1:5, 4), yval = 1:20)

# The followings are OK
ggplot(d1, aes(xval, yval))

ggplot(d1, aes(xval, yval, colour = "Type 1"))

nrow(d1[0,])
#> [1] 0

ggplot(d1[0,], aes(xval, yval))

ggplot(d1[0,], aes(xval, yval, colour = "Type 1"))
#> Error: Aesthetics must be either length 1 or the same as the data (1): x, y, colour

Created on 2018-08-23 by the reprex package (v0.2.0.9000).

@hisakatha
Copy link
Author

I found two workarounds, while I still hope ggplot tolerates combination of zero-row data and constant aesthetics.

  1. Create a new data
library(ggplot2)
d2 <- data.frame(xval = rep(1:5, 4), yval = 1:20, col_type = "Type 1")
ggplot(d2, aes(xval, yval, colour = col_type))

ggplot(d2[0,], aes(xval, yval, colour = col_type))

Created on 2018-08-24 by the reprex
package
(v0.2.0).

  1. Repeat the constant
library(ggplot2)
d1 <- data.frame(xval = rep(1:5, 4), yval = 1:20)
ggplot(d1, aes(xval, yval, colour = rep("Type 1", nrow(d1))))

d1zero <- d1[0,]
ggplot(d1zero, aes(xval, yval, colour = rep("Type 1", nrow(d1zero))))

Created on 2018-08-24 by the reprex
package
(v0.2.0).

@karawoo
Copy link
Member

karawoo commented Aug 23, 2018

If the data has zero rows then the value of the longest unevaluated aesthetic gets used. Here colour has length 1 (length("Type 1")) which then causes a mismatch between n (now 1) and x and y which still have length zero.

ggplot2/R/layer.r

Lines 220 to 229 in 01155ba

n <- nrow(data)
if (n == 0) {
# No data, so look at longest evaluated aesthetic
if (length(evaled) == 0) {
n <- 0
} else {
n <- max(vapply(evaled, length, integer(1)))
}
}
check_aesthetics(evaled, n)

Can you say a bit more about your use case? The combination of zero-row data and mapping an aesthetic to a string is a little unusual.

@hisakatha
Copy link
Author

Thank you for your comment.
I realized that there are workarounds, but my first attempt was something like the following:

library(ggplot2)
set.seed(123)
d1_1 <- data.frame(yval = rnorm(20))
d1_2 <- data.frame(yval = rnorm(10))
d2 <- data.frame(yval = rnorm(15))
ggplot(mapping = aes(y = yval)) +
  geom_violin(data = d1_1, mapping = aes(x = 1, fill = "Type 1")) +
  geom_violin(data = d1_2, mapping = aes(x = 2, fill = "Type 1")) +
  geom_violin(data = d2, mapping = aes(x = 4, fill = "Type 2")) +
  scale_x_continuous(breaks = c(1,2,4), labels = c("Data 1_1", "Data 1_2", "Data 2"))

Created on 2018-08-24 by the reprex package (v0.2.0).

However, in the real use case, some of the data may have zero rows, in which case ggplot throws the error.

@karawoo
Copy link
Member

karawoo commented Aug 24, 2018

Ah, I see. I think part of the issue here is that ggplot2 does have certain expectations about the format of data that is provided. You'll have the most success when mapping columns of data in a data frame to the visual variables you want to see in the plot, rather than manually creating the violins from separate data frames. Here is an example of what I mean using your data:

library("tidyverse")
set.seed(123)
d1_1 <- data.frame(yval = rnorm(20))
d1_2 <- data.frame(yval = rnorm(10))
d2 <- data.frame(yval = rnorm(15))

## Combine the data into one dataset
dat <- bind_rows(
  list(d1_1 = d1_1, d1_2 = d1_2, d2 = d2),
  .id = "id"
) %>%
  ## Extract the 1 or 2 from d1, d2 etc. to inform fill color
  mutate(fill = substr(id, 2, 2))

p <- ggplot(dat, aes(x = id, y = yval, fill = fill)) +
  geom_violin()

p

Then if you want to customize the labels etc. you can do so with scale_* functions:

p +
  scale_fill_discrete(labels = c("Type 1", "Type 2")) +
  scale_x_discrete(labels = c("Data 1_1", "Data 1_2", "Data 2"))

This isn't so much a workaround as it is taking full advantage of ggplot2's ability to understand all the data at once, and it should hopefully solve the original problem as a) there won't be any zero-row data once the data is combined (unless everything has zero rows), and b) since fill is mapped to a variable rather than a single value, if the data does have zero rows there won't be a mismatch between data and aesthetic length.

@hisakatha
Copy link
Author

hisakatha commented Aug 26, 2018

Thank you so much!
If the behavior

If the data has zero rows then the value of the longest unevaluated aesthetic gets used.

is intended, I think the issue has been solved.

However, I'd like to report an example odd for me, relating to zero-row data. If some of the data have zero rows (or are dummy), the widths of violins are larger than those in other cases. I'm sorry if this is documented, a duplicate, or a matter of preference, but I'd like to report in case this is undesirable.

library(tidyverse)
set.seed(123)
dat <- data.frame(x = rep(c("a","b","c"), 20), y = rnorm(60))
p <- ggplot(dat, aes(x,y)) + geom_violin()
p + scale_x_discrete(limits = c("c", "b", "a"))

# Wide!
p + scale_x_discrete(limits = c("c", "DUMMY", "a"))
#> Warning: Removed 20 rows containing non-finite values (stat_ydensity).

p + scale_x_discrete(limits = c("DUMMY1", "DUMMY2", "a"))
#> Warning: Removed 40 rows containing non-finite values (stat_ydensity).

p + scale_x_discrete(limits = c("DUMMY", "b", "a"))
#> Warning: Removed 20 rows containing non-finite values (stat_ydensity).

p + scale_x_discrete(limits = c("c", "DUMMY", "b", "a"))

# Wide!
p + scale_x_discrete(limits = c("c", "DUMMY1", "b", "DUMMY2"))
#> Warning: Removed 20 rows containing non-finite values (stat_ydensity).

Created on 2018-08-27 by the reprex package (v0.2.0).

@paleolimbot
Copy link
Member

While there are usually better ways to solve the problem by the original poster, I would argue that zero-length value handling by the layers is inconsistent, as ggplot2:::data_frame() interpret length 1 values as scalars (as does tibble::tibble()). By extension, so does annotate().

This currently fails:

library(ggplot2)
df <- data.frame(x = numeric(0), y = numeric(0))
ggplot(df, aes(x, y, colour = "a value")) + geom_point()
#> Error: Aesthetics must be either length 1 or the same as the data (1): x, y

But these do not:

library(ggplot2)
df <- data.frame(x = numeric(0), y = numeric(0))

ggplot() + annotate("point", x = df$x, y = df$y, colour = "a value")
tibble::tibble(x = df$x, y = df$y, colour = "a value")
#> # A tibble: 0 x 3
#> # … with 3 variables: x <dbl>, y <dbl>, colour <chr>
ggplot2:::data_frame(x = df$x, y = df$y, colour = "a value")
#> [1] x      y      colour
#> <0 rows> (or 0-length row.names)

@paleolimbot paleolimbot added bug an unexpected problem or unintended behavior internals 🔎 labels Jun 7, 2019
@thomasp85 thomasp85 added this to the ggplot2 3.3.4 milestone Mar 25, 2021
@thomasp85
Copy link
Member

@paleolimbot while I get your point, there is real difference in what happens in an aes() call and what happens in annotate where nothing has to be matched up with the data. I'm closing this, but feel free to reopen if you feel strongly about it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior internals 🔎
Projects
None yet
Development

No branches or pull requests

5 participants