From af5a54f19a58a52cacc3d2f31917e9083df5f074 Mon Sep 17 00:00:00 2001 From: Sean Griffin Date: Wed, 29 May 2019 15:10:35 -0600 Subject: [PATCH] Optimize queries that need to sort on `recent_crate_downloads.downloads` I'm seeing two queries that are beginning to slow down beyond acceptable levels, both sort by `recent_crate_downloads.downloads`. The first one is the "Most Recent Downloads" section on the summary page, which is just solved by adding an index on the downloads column. The second case is the "recent downloads" sort on search, which is a bit trickier. When the column is being sorted from the right side of a left join, the index can no longer be used (since there might not be a row there at all). The only way to fix this is to switch to using an inner join, which would mean that crates that haven't been downloaded in the last 90 days won't show up in search. In practice this only occurs for very recently uploaded crates, since crawlers like libs.rs end up downloading every crate. However, it could still lead to confusion, so I've redefined the view to make sure there's a row present for every crate. The crate still won't show up in search until the view gets updated though, which occurs every 10 minutes. This did cause some problems in our tests though, since a bunch of tests don't have any recent downloads, but assume that the crate is present in search, etc. Initially I tried just always updating the view in `PublishBuilder`, but since refreshing the view grabs an exclusive lock, it meant our test suite was much less parallel, and it also exacerbated tests that can deadlock (pretty much every test in teams.rs has a deadlocking issue, but fixing it requires none of the tests using the same team name which is non-trivial). Ultimately I opted to only refresh the view on the endpoints that will be querying it, which means the tests don't need to care about whether they're depending on this or not, and tests which never hit those endpoints won't block waiting for other tests that do. The implementation was a tad tricky. We need to ignore errors that occur refreshing it since we might be in read only mode. I also had to put the generic constraints for `C` on the struct itself instead of just the `Handler` impl, otherwise the closures passed to it in the unit tests for the router didn't seem to infer the correct closure type. Two other options would be to add the index without the join change/view definition change, which would mean that the summary page gets fixed, but search is still slow. Summary is much more frequently hit than search ordered by recent downloads, so this wouldn't be the end of the world. The other option would be to run our test suite with `--test-threads=2` (interestingly, 2 test threads does not run tests in parallel, and our suite blows up with `--test-threads=1`). If we did this, we could refresh the view from `CrateBuilder` (and the handful of tests which *actually* hit the publish endpoint), so it wouldn't really remove any complexity, it'd just move it to the tests directory. Before: ``` QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=5497.84..5497.89 rows=100 width=882) (actual time=52.612..52.628 rows=100 loops=1) InitPlan 1 (returns $0) -> Limit (cost=1418.25..1418.26 rows=1 width=8) (actual time=8.106..8.107 rows=1 loops=1) -> Aggregate (cost=1418.25..1418.26 rows=1 width=8) (actual time=8.105..8.105 rows=1 loops=1) -> Index Only Scan using packages_pkey on crates crates_1 (cost=0.06..1405.16 rows=26180 width=0) (actual time=0.011..6.732 rows=26181 loops=1) Heap Fetches: 7526 -> Sort (cost=4079.58..4092.67 rows=26180 width=882) (actual time=52.610..52.619 rows=100 loops=1) Sort Key: recent_crate_downloads.downloads DESC NULLS LAST Sort Method: top-N heapsort Memory: 160kB -> Hash Left Join (cost=466.18..3879.46 rows=26180 width=882) (actual time=16.495..38.755 rows=26181 loops=1) Hash Cond: (crates.id = recent_crate_downloads.crate_id) -> Seq Scan on crates (cost=0.00..3399.54 rows=26180 width=865) (actual time=0.007..14.165 rows=26181 loops=1) -> Hash (cost=374.54..374.54 rows=26181 width=12) (actual time=8.338..8.338 rows=26181 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 1381kB -> Seq Scan on recent_crate_downloads (cost=0.00..374.54 rows=26181 width=12) (actual time=0.004..4.592 rows=26181 loops=1) Planning Time: 0.341 ms Execution Time: 52.691 ms ``` After: ``` QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=5497.84..5497.89 rows=100 width=882) (actual time=57.899..57.917 rows=100 loops=1) InitPlan 1 (returns $0) -> Limit (cost=1418.25..1418.26 rows=1 width=8) (actual time=9.722..9.723 rows=1 loops=1) -> Aggregate (cost=1418.25..1418.26 rows=1 width=8) (actual time=9.721..9.721 rows=1 loops=1) -> Index Only Scan using packages_pkey on crates crates_1 (cost=0.06..1405.16 rows=26180 width=0) (actual time=0.034..8.335 rows=26181 loops=1) Heap Fetches: 7526 -> Sort (cost=4079.58..4092.67 rows=26180 width=882) (actual time=57.898..57.907 rows=100 loops=1) Sort Key: recent_crate_downloads.downloads DESC Sort Method: top-N heapsort Memory: 160kB -> Hash Left Join (cost=466.18..3879.46 rows=26180 width=882) (actual time=19.839..44.014 rows=26181 loops=1) Hash Cond: (crates.id = recent_crate_downloads.crate_id) -> Seq Scan on crates (cost=0.00..3399.54 rows=26180 width=865) (actual time=0.010..15.211 rows=26181 loops=1) -> Hash (cost=374.54..374.54 rows=26181 width=12) (actual time=9.941..9.941 rows=26181 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 1381kB -> Seq Scan on recent_crate_downloads (cost=0.00..374.54 rows=26181 width=12) (actual time=0.008..5.128 rows=26181 loops=1) Planning Time: 1.287 ms Execution Time: 58.018 ms ``` --- .../down.sql | 8 ++++ .../up.sql | 10 +++++ src/controllers/krate/search.rs | 6 +-- src/router.rs | 42 +++++++++++++++---- src/tests/builders.rs | 5 +-- src/tests/krate.rs | 4 +- 6 files changed, 59 insertions(+), 16 deletions(-) create mode 100644 migrations/2019-05-29-192622_index_crate_downloads_by_downloads/down.sql create mode 100644 migrations/2019-05-29-192622_index_crate_downloads_by_downloads/up.sql diff --git a/migrations/2019-05-29-192622_index_crate_downloads_by_downloads/down.sql b/migrations/2019-05-29-192622_index_crate_downloads_by_downloads/down.sql new file mode 100644 index 00000000000..ba704a2ad05 --- /dev/null +++ b/migrations/2019-05-29-192622_index_crate_downloads_by_downloads/down.sql @@ -0,0 +1,8 @@ +DROP MATERIALIZED VIEW recent_crate_downloads; +CREATE MATERIALIZED VIEW recent_crate_downloads (crate_id, downloads) AS + SELECT crate_id, SUM(version_downloads.downloads) FROM version_downloads + INNER JOIN versions + ON version_downloads.version_id = versions.id + WHERE version_downloads.date > date(CURRENT_TIMESTAMP - INTERVAL '90 days') + GROUP BY crate_id; +CREATE UNIQUE INDEX recent_crate_downloads_crate_id ON recent_crate_downloads (crate_id); diff --git a/migrations/2019-05-29-192622_index_crate_downloads_by_downloads/up.sql b/migrations/2019-05-29-192622_index_crate_downloads_by_downloads/up.sql new file mode 100644 index 00000000000..ab26d57aa93 --- /dev/null +++ b/migrations/2019-05-29-192622_index_crate_downloads_by_downloads/up.sql @@ -0,0 +1,10 @@ +DROP MATERIALIZED VIEW recent_crate_downloads; +CREATE MATERIALIZED VIEW recent_crate_downloads (crate_id, downloads) AS + SELECT crate_id, COALESCE(SUM(version_downloads.downloads), 0) FROM versions + LEFT JOIN version_downloads + ON version_downloads.version_id = versions.id + WHERE version_downloads.date > date(CURRENT_TIMESTAMP - INTERVAL '90 days') + OR version_downloads.date IS NULL + GROUP BY crate_id; +CREATE UNIQUE INDEX recent_crate_downloads_crate_id ON recent_crate_downloads (crate_id); +CREATE INDEX ON recent_crate_downloads (downloads); diff --git a/src/controllers/krate/search.rs b/src/controllers/krate/search.rs index e61ba527646..b258b2cdfc9 100644 --- a/src/controllers/krate/search.rs +++ b/src/controllers/krate/search.rs @@ -51,7 +51,7 @@ pub fn search(req: &mut dyn Request) -> CargoResult { recent_crate_downloads::downloads.nullable(), ); let mut query = crates::table - .left_join(recent_crate_downloads::table) + .inner_join(recent_crate_downloads::table) .select(selection) .into_boxed(); @@ -157,7 +157,7 @@ pub fn search(req: &mut dyn Request) -> CargoResult { if sort == "downloads" { query = query.then_order_by(crates::downloads.desc()) } else if sort == "recent-downloads" { - query = query.then_order_by(recent_crate_downloads::downloads.desc().nulls_last()) + query = query.then_order_by(recent_crate_downloads::downloads.desc()) } else if sort == "recent-updates" { query = query.order(crates::updated_at.desc()); } else { @@ -177,7 +177,7 @@ pub fn search(req: &mut dyn Request) -> CargoResult { // FIXME: Use `query.selection()` if that feature ends up in // Diesel 2.0 selection, - coalesce(crates::table.count().single_value(), 0), + coalesce(recent_crate_downloads::table.count().single_value(), 0), )) .limit(limit) .offset(offset) diff --git a/src/router.rs b/src/router.rs index efea8b8f680..f5536264745 100644 --- a/src/router.rs +++ b/src/router.rs @@ -13,7 +13,7 @@ pub fn build_router(app: &App) -> R404 { let mut api_router = RouteBuilder::new(); // Route used by both `cargo search` and the frontend - api_router.get("/crates", C(krate::search::search)); + api_router.get("/crates", C(needs_recent_downloads(krate::search::search))); // Routes used by `cargo` api_router.put("/crates/new", C(krate::publish::publish)); @@ -35,7 +35,10 @@ pub fn build_router(app: &App) -> R404 { api_router.get("/versions/:version_id", C(version::deprecated::show_by_id)); // Routes used by the frontend - api_router.get("/crates/:crate_id", C(krate::metadata::show)); + api_router.get( + "/crates/:crate_id", + C(needs_recent_downloads(krate::metadata::show)), + ); api_router.get("/crates/:crate_id/:version", C(version::metadata::show)); api_router.get( "/crates/:crate_id/:version/readme", @@ -89,7 +92,10 @@ pub fn build_router(app: &App) -> R404 { "/me/crate_owner_invitations/:crate_id", C(crate_owner_invitation::handle_invite), ); - api_router.get("/summary", C(krate::metadata::summary)); + api_router.get( + "/summary", + C(needs_recent_downloads(krate::metadata::summary)), + ); api_router.put("/confirm/:email_token", C(user::me::confirm_user_email)); api_router.put( "/users/:user_id/resend", @@ -125,12 +131,14 @@ pub fn build_router(app: &App) -> R404 { R404(router) } -struct C(pub fn(&mut dyn Request) -> CargoResult); +struct C CargoResult + Send + Sync + 'static>(F); -impl Handler for C { +impl Handler for C +where + F: Fn(&mut dyn Request) -> CargoResult + Send + Sync + 'static, +{ fn call(&self, req: &mut dyn Request) -> Result> { - let C(f) = *self; - match f(req) { + match self.0(req) { Ok(resp) => Ok(resp), Err(e) => match e.response() { Some(response) => Ok(response), @@ -154,6 +162,26 @@ impl Handler for R { } } +fn needs_recent_downloads( + f: fn(&mut dyn Request) -> CargoResult, +) -> impl Fn(&mut dyn Request) -> CargoResult { + use crate::db::RequestTransaction; + use crate::middleware::app::RequestApp; + use diesel::prelude::*; + + move |req| { + let app = req.app(); + if app.config.env == Env::Test { + let conn = req.db_conn()?; + let _ = conn.transaction(|| { + no_arg_sql_function!(refresh_recent_crate_downloads, ()); + diesel::select(refresh_recent_crate_downloads).execute(&*conn) + }); + } + f(req) + } +} + // Can't derive Debug because of RouteBuilder. #[allow(missing_debug_implementations)] pub struct R404(pub RouteBuilder); diff --git a/src/tests/builders.rs b/src/tests/builders.rs index 31a51878933..c09c419e399 100644 --- a/src/tests/builders.rs +++ b/src/tests/builders.rs @@ -230,7 +230,7 @@ impl<'a> CrateBuilder<'a> { } fn build(mut self, connection: &PgConnection) -> CargoResult { - use diesel::{insert_into, select, update}; + use diesel::{insert_into, update}; let mut krate = self .krate @@ -264,9 +264,6 @@ impl<'a> CrateBuilder<'a> { version_downloads::downloads.eq(downloads), )) .execute(connection)?; - - no_arg_sql_function!(refresh_recent_crate_downloads, ()); - select(refresh_recent_crate_downloads).execute(connection)?; } if !self.keywords.is_empty() { diff --git a/src/tests/krate.rs b/src/tests/krate.rs index 554e2410a3c..d0db48b0fcb 100644 --- a/src/tests/krate.rs +++ b/src/tests/krate.rs @@ -248,7 +248,7 @@ fn index_sorting() { let krate2 = CrateBuilder::new("bar_sort", user.id) .description("foo_sort baz_sort foo_sort baz_sort const") .downloads(3333) - .recent_downloads(0) + .recent_downloads(1) .expect_build(conn); let krate3 = CrateBuilder::new("baz_sort", user.id) @@ -331,7 +331,7 @@ fn exact_match_on_queries_with_sort() { let krate2 = CrateBuilder::new("bar_sort", user.id) .description("foo_sort baz_sort foo_sort baz_sort const") .downloads(3333) - .recent_downloads(0) + .recent_downloads(1) .expect_build(conn); let krate3 = CrateBuilder::new("baz_sort", user.id)