Skip to content

stat_density2d computes bins on entire data, not on groups #4113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tjebo opened this issue Jul 2, 2020 · 1 comment
Closed

stat_density2d computes bins on entire data, not on groups #4113

tjebo opened this issue Jul 2, 2020 · 1 comment

Comments

@tjebo
Copy link

tjebo commented Jul 2, 2020

This is kind of a follow up from issue #4003

When using bins argument, the bins seem to be calculated on the entire data, but not on the groups as defined per aes. I am not sure that this is what one expects when using this argument.

The below example shows this on the iris dataset, with bins = 5. - not showing 5 bins for each group. But I guess it calculates 5 bins for the absolute density values for the whole data. I am not sure if this is very helpful, because one does not get a real idea which bin is shown and also it is difficult to compare the groups.

I believe it would make more sense if bins would be calculated by group, because I guess one wants to compare probability distributions rather than absolute densities (?). The second plot shows what I believe one would rather expect when using bins. The function to calculate is (modified function from this SO answer)

library(dplyr)
library(ggplot2)

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_density2d(bins = 5) +
  geom_point()

The following is more what one would expect (at least I would do so). For the generation of man_calc_dens data frame, see below.

ggplot(mapping = aes(color = Species)) +
  geom_point(data = iris, aes(Sepal.Length, Sepal.Width, )) +
  geom_polygon(data = man_calc_dens, aes(x = x, y = y, group = interaction(Species, prob)), fill = NA) 

functions and manually calculated density

probc <- function(data, x = NULL, y = NULL, n = 50, prob = 0.95, ...) {
  
  post1 <- MASS::kde2d(data[[x]], data[[y]], n = n, ...)
  
  dx <- diff(post1$x[1:2])
  dy <- diff(post1$y[1:2])
  sz <- sort(post1$z)
  c1 <- cumsum(sz) * dx * dy
  
  levels <- sapply(prob, function(x) {
    approx(c1, sz, xout = 1 - x)$y
  })
  
  df <- as.data.frame(grDevices::contourLines(post1$x, post1$y, post1$z, levels = levels))
  df$x <- round(as.numeric(df$x), 3)
  df$y <- round(as.numeric(df$y), 3)
  df$level <- round(as.numeric(df$level), 2)
  df$prob <- rep(as.character(prob), nrow(df))
  
  
  df
}

man_calc_dens <- 
  iris %>% 
  split(., .$Species) %>%
  
  purrr::map(., .f = function(u) {dplyr::bind_rows(
  purrr::map(seq(0.2, 0.8, 0.2), function(p) 
    probc(data = u, x = "Sepal.Length", y = "Sepal.Width", prob = p))
  )}
) %>% 
  bind_rows(.id = "Species")
Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.0 (2020-04-24)
#>  os       macOS Catalina 10.15.5      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_GB.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2020-07-02                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                          
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.0.0)                  
#>  backports     1.1.8      2020-06-17 [1] CRAN (R 4.0.0)                  
#>  callr         3.4.3      2020-03-28 [1] CRAN (R 4.0.0)                  
#>  cli           2.0.2      2020-02-28 [1] CRAN (R 4.0.0)                  
#>  colorspace    1.4-1      2019-03-18 [1] CRAN (R 4.0.0)                  
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 4.0.0)                  
#>  curl          4.3        2019-12-02 [1] CRAN (R 4.0.0)                  
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 4.0.0)                  
#>  devtools      2.3.0      2020-04-10 [1] CRAN (R 4.0.0)                  
#>  digest        0.6.25     2020-02-23 [1] CRAN (R 4.0.0)                  
#>  dplyr       * 1.0.0.9000 2020-06-30 [1] Github (tidyverse/dplyr@7221da8)
#>  ellipsis      0.3.1      2020-05-15 [1] CRAN (R 4.0.0)                  
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.0.0)                  
#>  fansi         0.4.1      2020-01-08 [1] CRAN (R 4.0.0)                  
#>  farver        2.0.3      2020-01-16 [1] CRAN (R 4.0.0)                  
#>  fs            1.4.1      2020-04-04 [1] CRAN (R 4.0.0)                  
#>  generics      0.0.2      2018-11-29 [1] CRAN (R 4.0.0)                  
#>  ggplot2     * 3.3.2      2020-06-19 [1] CRAN (R 4.0.0)                  
#>  glue          1.4.1      2020-05-13 [1] CRAN (R 4.0.0)                  
#>  gtable        0.3.0      2019-03-25 [1] CRAN (R 4.0.0)                  
#>  highr         0.8        2019-03-20 [1] CRAN (R 4.0.0)                  
#>  htmltools     0.4.0      2019-10-04 [1] CRAN (R 4.0.0)                  
#>  httr          1.4.1      2019-08-05 [1] CRAN (R 4.0.0)                  
#>  isoband       0.2.2      2020-06-20 [1] CRAN (R 4.0.0)                  
#>  knitr         1.28       2020-02-06 [1] CRAN (R 4.0.0)                  
#>  labeling      0.3        2014-08-23 [1] CRAN (R 4.0.0)                  
#>  lifecycle     0.2.0      2020-03-06 [1] CRAN (R 4.0.0)                  
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 4.0.0)                  
#>  MASS          7.3-51.6   2020-04-26 [1] CRAN (R 4.0.0)                  
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 4.0.0)                  
#>  mime          0.9        2020-02-04 [1] CRAN (R 4.0.0)                  
#>  munsell       0.5.0      2018-06-12 [1] CRAN (R 4.0.0)                  
#>  pillar        1.4.4      2020-05-05 [1] CRAN (R 4.0.0)                  
#>  pkgbuild      1.0.8      2020-05-07 [1] CRAN (R 4.0.0)                  
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.0.0)                  
#>  pkgload       1.1.0      2020-05-29 [1] CRAN (R 4.0.0)                  
#>  prettyunits   1.1.1      2020-01-24 [1] CRAN (R 4.0.0)                  
#>  processx      3.4.2      2020-02-09 [1] CRAN (R 4.0.0)                  
#>  ps            1.3.3      2020-05-08 [1] CRAN (R 4.0.0)                  
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.0.0)                  
#>  R6            2.4.1      2019-11-12 [1] CRAN (R 4.0.0)                  
#>  Rcpp          1.0.4.6    2020-04-09 [1] CRAN (R 4.0.0)                  
#>  remotes       2.1.1      2020-02-15 [1] CRAN (R 4.0.0)                  
#>  rlang         0.4.6      2020-05-02 [1] CRAN (R 4.0.0)                  
#>  rmarkdown     2.2        2020-05-31 [1] CRAN (R 4.0.0)                  
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 4.0.0)                  
#>  scales        1.1.1      2020-05-11 [1] CRAN (R 4.0.0)                  
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 4.0.0)                  
#>  stringi       1.4.6      2020-02-17 [1] CRAN (R 4.0.0)                  
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.0.0)                  
#>  testthat      2.3.2      2020-03-02 [1] CRAN (R 4.0.0)                  
#>  tibble        3.0.1      2020-04-20 [1] CRAN (R 4.0.0)                  
#>  tidyselect    1.1.0      2020-05-11 [1] CRAN (R 4.0.0)                  
#>  usethis       1.6.1      2020-04-29 [1] CRAN (R 4.0.0)                  
#>  vctrs         0.3.1      2020-06-05 [1] CRAN (R 4.0.0)                  
#>  withr         2.2.0      2020-04-20 [1] CRAN (R 4.0.0)                  
#>  xfun          0.14       2020-05-20 [1] CRAN (R 4.0.0)                  
#>  xml2          1.3.2      2020-04-23 [1] CRAN (R 4.0.0)                  
#>  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.0)                  
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
@clauswilke
Copy link
Member

You can contour on the normalized density to obtain your desired result.

library(ggplot2)

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  geom_density2d(bins = 5, contour_var = "ndensity") +
  geom_point()

Created on 2020-07-02 by the reprex package (v0.3.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants