Skip to content

geom_sf is appallingly slow in some cases #2655

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thk686 opened this issue May 24, 2018 · 24 comments
Closed

geom_sf is appallingly slow in some cases #2655

thk686 opened this issue May 24, 2018 · 24 comments

Comments

@thk686
Copy link

thk686 commented May 24, 2018

Literally getting questions like "why should I use this software" from students when it takes 5 minutes to draw a map. Here's a specific example from a lesson. While it is always a bit slow, some plots take forever. Could it be because of the 'fill' argument?

con <- url("https://biogeo.ucdavis.edu/data/gadm3.6/Rsf/gadm36_PAK_3_sf.rds")
pakistan.gadm <- readRDS(con) %>%
  st_transform("+proj=laea +lat_0=31 +lon_0=69 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs")
ggplot() +
  geom_sf(data = st_simplify(pakistan.gadm, 1e4), aes(fill = NAME_3)) +
  guides(fill = FALSE)
@batpigandme
Copy link
Contributor

Could you please turn this into a self-contained reprex (short for minimal reproducible example)?

If you've never heard of a reprex before, you might want to start by reading the tidyverse.org help page.

Thanks

@thk686
Copy link
Author

thk686 commented May 24, 2018

devtools::install_github("hadley/ggplot2")
#> Using GitHub PAT from envvar GITHUB_PAT
#> Warning in strptime(x, fmt, tz = "GMT"): unknown timezone 'zone/tz/2018c.
#> 1.0/zoneinfo/America/Chicago'
#> Downloading GitHub repo hadley/ggplot2@master
#> from URL https://api.github.com/repos/hadley/ggplot2/zipball/master
#> Installing ggplot2
#> '/Library/Frameworks/R.framework/Resources/bin/R' --no-site-file  \
#>   --no-environ --no-save --no-restore --quiet CMD INSTALL  \
#>   '/private/var/folders/j5/977488_x28x3hcjmb6p5nmgw0000gn/T/RtmphU41Uj/devtools4c3513daba4c/tidyverse-ggplot2-69dfc4b'  \
#>   --library='/Library/Frameworks/R.framework/Versions/3.4/Resources/library'  \
#>   --install-tests
#> 
library(ggplot2)
library(sf)
#> Warning: package 'sf' was built under R version 3.4.4
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
con <- url("https://biogeo.ucdavis.edu/data/gadm3.6/Rsf/gadm36_PAK_3_sf.rds")
pakistan.gadm <- readRDS(con) %>% st_transform("+proj=laea +lat_0=31 +lon_0=69 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs")
ggplot() + geom_sf(data = st_simplify(pakistan.gadm, 10000), aes(fill = NAME_3)) + 
  guides(fill = FALSE)

@batpigandme
Copy link
Contributor

Hmm – I just took the elapsed time from before and after actually calling ggplot and it was < 4 seconds…

library(ggplot2)
library(sf)
#> Linking to GEOS 3.5.0, GDAL 2.1.0, proj.4 4.8.0
con <- url("https://biogeo.ucdavis.edu/data/gadm3.6/Rsf/gadm36_PAK_3_sf.rds")
pakistan.gadm <- readRDS(con) %>% st_transform("+proj=laea +lat_0=31 +lon_0=69 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs")
x <- Sys.time()
ggplot() + geom_sf(data = st_simplify(pakistan.gadm, 10000), aes(fill = NAME_3)) + 
  guides(fill = FALSE)

(Sys.time() - x)
#> Time difference of 3.330275 secs

@clauswilke
Copy link
Member

It takes about 90 seconds to render this on my system. fill is not at fault. And when I try to display the result in the RStudio plot window things seem even slower. In fact, this crashed my RStudio session when I first tried to run it.

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
con <- url("https://biogeo.ucdavis.edu/data/gadm3.6/Rsf/gadm36_PAK_3_sf.rds")
pakistan.gadm <- readRDS(con) %>%
  st_transform("+proj=laea +lat_0=31 +lon_0=69 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs")
pakistan.simple <- st_simplify(pakistan.gadm, 1e4)

library(ggplot2)
library(microbenchmark)

p1 <- ggplot() +
  geom_sf(data = pakistan.simple, aes(fill = NAME_3)) +
  guides(fill = FALSE)

p2 <- ggplot() +
  geom_sf(data = pakistan.simple)

microbenchmark(print(p1), print(p2), times = 1)

#> Unit: seconds
#>       expr      min       lq     mean   median       uq      max neval
#>  print(p1) 91.13243 91.13243 91.13243 91.13243 91.13243 91.13243     1
#>  print(p2) 90.76306 90.76306 90.76306 90.76306 90.76306 90.76306     1

Created on 2018-05-24 by the reprex package (v0.2.0).

@thomasp85
Copy link
Member

While your issue may be valid, may I suggest you don’t word your next issue in ways that seem to insult the developers

@hadley
Copy link
Member

hadley commented May 24, 2018

sum(sapply(pakistan.simple$geometry[[122]], function(x) nrow(x[[1]])))
[1] 67556

I don't know of a good way to figure out the total number of points, but I think there are a lot

@hadley
Copy link
Member

hadley commented May 24, 2018

@edzer is there a simple way to total number of vertices in a multipolygon geometry?

@tungttnguyen
Copy link

I tried @clauswilke 's code and it took ~ 2s on my Win 10 laptop (core i5, 16GB RAM). So problem might be OS dependent?

R> microbenchmark(print(p1), print(p2), times = 1)
Unit: seconds
      expr    min     lq   mean median     uq    max neval
 print(p1) 2.3797 2.3797 2.3797 2.3797 2.3797 2.3797     1
 print(p2) 1.8401 1.8401 1.8401 1.8401 1.8401 1.8401     1

@clauswilke
Copy link
Member

I guess we can blame the OS X Quartz device. Unfortunately it's used by R Studio and hence what people experience in interactive work. Rendering to pdf and opening in Preview takes ~2s total on the same machine.

library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
con <- url("https://biogeo.ucdavis.edu/data/gadm3.6/Rsf/gadm36_PAK_3_sf.rds")
pakistan.gadm <- readRDS(con) %>%
  st_transform("+proj=laea +lat_0=31 +lon_0=69 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs")
pakistan.simple <- st_simplify(pakistan.gadm, 1e4)

library(ggplot2)
library(microbenchmark)

p1 <- ggplot() +
  geom_sf(data = pakistan.simple, aes(fill = NAME_3)) +
  guides(fill = FALSE)

microbenchmark(ggsave("test1.pdf", p1), ggsave("test1.png", p1), times = 1)
#> Saving 7 x 5 in image
#> Saving 7 x 5 in image
#> Unit: seconds
#>                     expr        min         lq       mean     median
#>  ggsave("test1.pdf", p1)   1.357542   1.357542   1.357542   1.357542
#>  ggsave("test1.png", p1) 119.305828 119.305828 119.305828 119.305828
#>          uq        max neval
#>    1.357542   1.357542     1
#>  119.305828 119.305828     1

Created on 2018-05-25 by the reprex package (v0.2.0).

@clauswilke
Copy link
Member

The upside is there's nothing fundamentally wrong with ggplot's approach to rendering maps.

@thk686
Copy link
Author

thk686 commented May 26, 2018 via email

@edzer
Copy link
Contributor

edzer commented May 27, 2018

These are great insights!

@hadley to compute # vertices, the simplest I can think of is

sum(rapply(st_geometry(pakistan.gadm), nrow))

@thk686
Copy link
Author

thk686 commented May 27, 2018 via email

@hadley
Copy link
Member

hadley commented May 28, 2018

This example has 127,680 points, so it's not surprising that it's slow to render. I think the solutions are outside the scope of ggplot2 — either someone needs to improve the performance of the quartz device, or sf needs to perform that optimisation suggested by @thk686

@hadley hadley closed this as completed May 28, 2018
@clauswilke
Copy link
Member

@hadley I agree it’s out of scope for ggplot2. Do you know how to best report a bug for the quartz device? I think the performance is seriously off, if I can render to pdf and then pdf to screen in ~2s total but rendering to png via quartz takes ~100s.

@hadley
Copy link
Member

hadley commented May 28, 2018

Probably best to email the maintainer directly (I'm pretty sure it's Simon Urbanek)

@ateucher
Copy link

Just FYI, this was also discussed in the rstudio community forum

@clauswilke
Copy link
Member

I have contacted the R Core team, which is listed as official maintainer of the quartz device.

@jeroen
Copy link
Contributor

jeroen commented May 29, 2018

XQuartz is a mess, put this in your onload to default to cairo instead:

if(!identical(getOption("bitmapType"), "cairo") && isTRUE(capabilities()[["cairo"]])){
  options(bitmapType = "cairo")
}

@thk686
Copy link
Author

thk686 commented May 29, 2018 via email

@clauswilke
Copy link
Member

Unfortunately this doesn't fix the interactive use in R Studio, though, as far as I can tell.

@jeroen
Copy link
Contributor

jeroen commented May 30, 2018

@clauswilke I think it does? Rstudio uses the default png device.

@clauswilke
Copy link
Member

This is the behavior I see in RStudio:

> dev.off()
null device 
          1 
> dev.list()
NULL
> options(bitmapType = "cairo")
> library(ggplot2)
> ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point()
> dev.list()
        RStudioGD quartz_off_screen 
                2                 3 

@lock
Copy link

lock bot commented Nov 26, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Nov 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants