From dd08998086bcbd4021b9c728b95c426ec3c5f52d Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Mon, 8 Apr 2024 14:09:33 -0700 Subject: [PATCH 01/28] Added smooth qr vignette --- vignettes/smooth-qr.Rmd | 303 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 303 insertions(+) create mode 100644 vignettes/smooth-qr.Rmd diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd new file mode 100644 index 000000000..777379120 --- /dev/null +++ b/vignettes/smooth-qr.Rmd @@ -0,0 +1,303 @@ +--- +title: "Smooth Quantile Regression" +--- + +```{r setup, include = FALSE} +knitr::opts_chunk$set( + collapse = FALSE, + comment = "#>", + warning = FALSE, + message = FALSE, + cache = TRUE +) +``` + +# Introducing smooth quantile regression + +Whereas the standard application of time-series forecasting techniques has been to forecast single horizons, in multi-period forecasting, the goal is to forecast several horizons simultaneously. The standard application of these techniques aims to predict the signal for a single forecast horizon (or "ahead"), most often one-step-ahead. This is useful in epidemiological applications where decisions are often based on the trend of a signal. + +The idea underlying smooth quantile regression that the future can be approximated by a smooth curve. So this approach enforces smoothness across the horizons for point estimation by regression or interval prediction by quantile regression. Our focus in this chapter is the latter. + +# Built-in function for smooth quantile regression and its parameters + +The built-in smooth quantile regression function, `smooth_quantile_reg()` provides a model specification for smooth quantile regression that works under the tidymodels framework. It has the following parameters and default values: + +```{r, eval = FALSE} +smooth_quantile_reg( + mode = "regression", + engine = "smoothqr", + outcome_locations = NULL, + quantile_levels = 0.5, + degree = 3L +) +``` + +For smooth quantile regression, the type of model or `mode` is regression. + +The only `engine` that is currently supported is `smooth_qr()` from the [`smooth_qr` package](https://dajmcdon.github.io/smoothqr/). + +The `outcome_locations` indicate the multiple horizon (ie. ahead) values. These should be specified by the user. + +The `quantile_levels` parameter is a vector of values that indicates the quantiles to be estimated. The default is the median (0.5 quantile). + +The `degree` parameter indicates the number of polynomials used for smoothing of the response. It should be no more than the number of responses. If the degree is precisely equal to the number of aheads, then there is no smoothing. To better understand this parameter and how it works, we should look to its origins and how it is used in the model. + +# Model form + +Smooth quantile regression is linear auto-regressive, with the key feature being a transformation that forces the coefficients to satisfy a constraint. The purpose if this is for each model coefficient to be a smooth function of ahead values, and so each such coefficient is set to be a linear combination of smooth basis functions (such as a spline or a polynomial). + +The `degree` parameter controls the number of these polynomials used. It should be no greater than the number of responses. This is a tuning parameter, and so it can be chosen by performing a grid search with cross-validation. Intuitively, $d = 1$ corresponds to the constant model, $d = 2$ gives straight line forecasts, while $d = 3$ gives quadratic forecasts. Since a degree of 3 was found to work well in the tested applications (see Section 9 of Tuzhilina et al, 2022), it is the default value. + +# Demonstration of smooth quantile regression + +```{r, message = FALSE} +library(epipredict) +library(tidyverse) +``` + +We will now apply smooth quantile regression on the real data used for COVID-19 forecasting. The built-in dataset we will use is a subset of JHU daily data on state cases and deaths. This sample data ranges from Dec 31, 2020 to Dec 31, 2021. + +```{r} +edf <- case_death_rate_subset +``` + +We will set the forecast date to be November 30, 2021 so that we can produce forecasts for target dates of 1 to 28 days ahead. We construct our test data, `tedf` from the days beyond this. + +```{r} +fd <- as.Date("2021-11-30") + +tedf <- edf %>% filter(time_value >= fd) +``` + +We will use the most recent 3 months worth of data up to the forecast date for training. + +```{r} +edf <- edf %>% filter(time_value < fd, time_value >= fd - 90L) +``` + +And for plotting our focus will be on a subset of two states - California and Utah. + +```{r} +geos <- c("ut", "ca") +``` + +Suppose that our goal with this data is to predict COVID-19 death rates at several horizons for each state. On day $t$, we want to predict new deaths $y$ that are $a = 1,\dots, 28$ days ahead at locations $j$ using the death rates from today, 1 week ago, and 2 weeks ago. So for each location, we'll predict the median (0.5 quantile) for each of the target dates by using +$$ +\hat{y}_{j}(t+a) = \alpha(a) + \sum_{l = 0}^2 \beta_{l}(a) y_{j}(t - 7l) +$$ +where $\beta_{l}(a) = \sum_{i=1}^d \theta_{il} h_i(a)$ is the smoothing constraint where ${h_1(a), \dots, h_d(a)}$ are the set of smooth basis functions and $d$ is a hyperparameter that manages the flexibility of $\beta_{l}(a)$. Remember that the goal is to have each $\beta_{l}(a)$ to be a smooth function of the aheads and that is achieved through imposing the smoothing constraint. + +Note that this model is intended to be simple and straightforward. Our only modification to this model is to add case rates as another predictive feature (we will leave it to the reader to incorporate additional features beyond this and the historical response values). We can update the basic model incorporate the $k = 2$ predictive features of case and death rates for each location j, $x_j(t) = (x_{j1}(t), x_{j2}(t))$ as follows: + +$$ +\hat{y}_{j}(t+a) = \alpha(a) + \sum_{k = 1}^2 \sum_{l = 0}^2 \beta_{kl}(a) x_{jk}(t - 7l) +$$ +where $\beta_{kl}(a) = \sum_{i=1}^d \theta_{ikl} h_i(a)$. + +Now, we will create our own forecaster from scratch by building up an `epi_workflow` (there is no canned forecaster that is currently available). Building our own forecaster allows for customization and control over the pre-processing and post-processing actions we wish to take. + +The pre-processing steps we take in our `epi_recipe` are simply to lag the predictor (by 0, 7, and 14 days) and lead the response by the multiple aheads specified by the function user. + +The post-processing layers we add to our `frosting` are nearly as simple. We first predict, unnest the prediction list-cols, omit NAs from them, and enforce that they are greater than 0. + +The third component of an to an `epi_workflow`, the model, is smooth quantile regression, which has three main arguments - the quantiles, aheads, and degree. + +After creating our `epi_workflow` with these components, we get our test data based on longest lag period and make the predictions. + +We input our forecaster into a function for ease of use. + +```{r} +smooth_fc <- function(x, aheads = 1:28, degree = 3L, quantiles = 0.5, fd){ +rec <- epi_recipe(x) %>% + step_epi_lag(case_rate, lag = c(0, 7, 14)) %>% + step_epi_lag(death_rate, lag = c(0, 7, 14)) %>% + step_epi_ahead(death_rate, ahead = aheads) + +f <- frosting() %>% + layer_predict() %>% + layer_unnest(.pred) %>% + layer_naomit(distn) %>% + layer_add_forecast_date() %>% + layer_threshold(distn) + +ee <- smooth_quantile_reg( + quantile_levels = quantiles, + outcome_locations = aheads, + degree = degree +) + +ewf <- epi_workflow(rec, ee, f) + +the_fit <- ewf %>% fit(x) + +latest <- get_test_data(rec, x, fill_locf = TRUE) + +preds <- predict(the_fit, new_data = latest) %>% + mutate(forecast_date = fd, target_date = fd + ahead) %>% + select(geo_value, target_date, distn, ahead) %>% + pivot_quantiles_wider(distn) + +preds +} +``` + +Notice that we allow the function user to specify the aheads, degree, and quantile as they may want to change these parameter values. We also allow for input of the forecast date as we fixed that at the onset of this demonstration. + +We now can produce smooth quantile regression predictions for our problem: + +```{r, warning = FALSE} +smooth_preds <- smooth_fc(edf, fd = fd) + +head(smooth_preds) +``` +Most often, we're not going to want to limit ourselves to just predicting the median value as there is uncertainty about the predictions, so let's try to predict several different quantiles in addition to the median: + +```{r, warning = FALSE} +smooth_preds <- smooth_fc(edf, quantiles = c(.1, .25, .5, .75, .9), fd = fd) + +head(smooth_preds) +``` + +We can see that we have different columns for the different quantile predictions. + +Let's visualize these results for the sample of two states. We will create a simple plotting function, under which the median predictions are an orange line and the surrounding quantiles are blue bands around this. For comparison, we will include the actual values over time as a black line. + +```{r} +plot_preds <- function(preds, geos_to_plot = NULL, train_test_dat, fd){ + + if (!is.null(geos_to_plot)) { + preds <- preds %>% filter(geo_value %in% geos_to_plot) + train_test_dat <- train_test_dat %>% filter(geo_value %in% geos_to_plot) + } + + ggplot(preds) + + geom_ribbon(aes(target_date, ymin = `0.1`, ymax = `0.9`), + fill = "cornflowerblue", alpha = .8) + + geom_ribbon(aes(target_date, ymin = `0.25`, ymax = `0.75`), + fill = "#00488E", alpha = .8) + + geom_line(data = train_test_dat, aes(time_value, death_rate), size = 1.5) + + geom_line(aes(target_date, `0.5`), color = "orange", size = 1.5) + + geom_vline(xintercept = fd) + + facet_wrap(~geo_value) + + theme_bw(16) + + scale_x_date(name = "", date_labels = "%b %Y") + + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + + ylab("Deaths per 100K inhabitants") +} +``` + +Since we would like to plot the actual death rates for these states over time, we bind the training and testing data together and input this into our plotting function as follows: + +```{r, warning = FALSE} +plot_preds(smooth_preds, geos, bind_rows(tedf, edf), fd) +``` + +We can see that the predictions are smooth curves for each state, as expected when using smooth quantile regression. In addition while the curvature of the forecasts matches that of the truth, the forecasts do not look remarkably accurate. + +## Varying the degrees parameter + +We can test the impact of different degrees by using the `map()` function. Let's try out all degrees from 1 to 7: + +```{r, warning = FALSE} +smooth_preds_list <- map(1:7, ~ smooth_fc(edf, + degree = .x, + quantiles = c(.1, .25, .5, .75, .9), + fd = fd) %>% + mutate(degree = .x) + +) %>% list_rbind() +``` + +One way to quantify the impact of these on the forecasting is to look at the mean absolute error (MAE) or mean squared error (MSE) over the degrees. We can select the degree that results in the lowest MAE. + +Since the MAE compares the predicted values to the actual values, we will first join the test data to the predicted data for our comparisons: +```{r, message = FALSE} +tedf_sub <- tedf %>% + rename(target_date = time_value, actual = death_rate) %>% + select(geo_value, target_date, actual) + +smooth_preds_df_deg = smooth_preds_list %>% + left_join(tedf_sub) +``` + +And then compute the MAE for each of the degrees: +```{r, message = FALSE} +smooth_preds_df_deg = smooth_preds_list %>% + left_join(tedf_sub) %>% + group_by(degree) %>% + mutate(error = abs(`0.5` - actual)) %>% + summarise(mean = mean(error)) + +# Arrange the MAE from smallest to largest +head(smooth_preds_df_deg %>% arrange(mean)) +``` + +Instead of just looking at the raw numbers, let's create a simple line plot to visualize how the MAE changes over degrees for this data: + +```{r} +ggplot(smooth_preds_df_deg, aes(degree, mean)) + + geom_line() + + xlab("Degrees of freedom") + + ylab("Mean MAE") +``` + +We can see that the degree that results in the lowest MAE is 3. Hence, we could pick this degree for future forecasting work on this data. + +## A brief comparison between smoothing and no smoothing + +Now, we will briefly compare the results from using smooth quantile regression to those obtained without smoothing. The latter approach amounts to ordinary quantile regression to get predictions for the intended target date. The main drawback is that it ignores the fact that the responses all represent the same signal, just for different ahead values. In contrast, the smooth quantile regression approach utilizes this information about the data structure - the fact that the aheads in are not be independent of each other, but rather they are naturally related over time by a smooth curve. + +To get the basic quantile regression results we can utilize the forecaster that we've already built. We can simply set the degree to be the number of ahead values to re-run the code without smoothing. +```{r, warning = FALSE} +baseline_preds <- smooth_fc(edf, degree = 28L, quantiles = c(.1, .25, .5, .75, .9), fd = fd) +``` + +And we can produce the corresponding plot to inspect the predictions obtained under the baseline model: + +```{r, warning = FALSE} +plot_preds(baseline_preds, geos, bind_rows(tedf, edf), fd) +``` + +Unlike for smooth quantile regression, the resulting forecasts are not smooth curves, but rather jagged and irregular in shape. + +For a more formal comparison between the two approaches, we could compare the test performance in terms of accuracy through calculating either the, MAE or MSE, where the performance measure of choice can be calculated over over all times and locations for each ahead value + +```{r, message = FALSE} +baseline_preds_df = baseline_preds %>% + left_join(tedf_sub) %>% + group_by(ahead) %>% + mutate(error = abs(`0.5` - actual)) %>% + summarise(mean = mean(error)) %>% + mutate(type = "baseline") + +smooth_preds_df = smooth_preds %>% + left_join(tedf_sub) %>% + group_by(ahead) %>% + mutate(error = abs(`0.5` - actual)) %>% + summarise(mean = mean(error)) %>% + mutate(type = "smooth") + +preds_df <- bind_rows(baseline_preds_df, smooth_preds_df) + +ggplot(preds_df, aes(ahead, mean, color = type)) + + geom_line() + + xlab("Ahead") + + ylab("Mean MAE") +``` + +or over all aheads, times, and locations for a single numerical summary. + +```{r} +mean(baseline_preds_df$mean) +mean(smooth_preds_df$mean) +``` + +The former shows that forecasts for the immediate future and for the distant future are more inaccurate for both models under consideration. This demonstrates the standard conclusion that the farther ahead in time, the more challenging the forecasting. The latter shows that the smooth quantile regression model and baseline models perform very similarly, with the smooth quantile regression model only slightly beating the baseline model in terms of overall MAE. + +# What we've learned in a nutshell + +Smooth quantile regression is used in multi-period forecasting for predicting several horizons simultaneously with a single smooth curve. It operates under the key assumption that the future of the response can be approximated well by a smooth curve. + +# Attribution + +The information presented on smooth quantile regression is from [Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723). \ No newline at end of file From 907aaa0a4ce285eb31149f8d3a239383fa1f7c72 Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Mon, 8 Apr 2024 15:03:17 -0700 Subject: [PATCH 02/28] Add period --- vignettes/smooth-qr.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 777379120..e8f382f46 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -55,7 +55,7 @@ library(epipredict) library(tidyverse) ``` -We will now apply smooth quantile regression on the real data used for COVID-19 forecasting. The built-in dataset we will use is a subset of JHU daily data on state cases and deaths. This sample data ranges from Dec 31, 2020 to Dec 31, 2021. +We will now apply smooth quantile regression on the real data used for COVID-19 forecasting. The built-in dataset we will use is a subset of JHU daily data on state cases and deaths. This sample data ranges from Dec. 31, 2020 to Dec. 31, 2021. ```{r} edf <- case_death_rate_subset @@ -300,4 +300,4 @@ Smooth quantile regression is used in multi-period forecasting for predicting se # Attribution -The information presented on smooth quantile regression is from [Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723). \ No newline at end of file +The information presented on smooth quantile regression is from [Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723). From 6e8fac543151305d362ba46a496789d6ebebcf45 Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Mon, 8 Apr 2024 15:14:56 -0700 Subject: [PATCH 03/28] Style file --- vignettes/smooth-qr.Rmd | 112 ++++++++++++++++++++-------------------- 1 file changed, 56 insertions(+), 56 deletions(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index e8f382f46..6769a8539 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -58,7 +58,7 @@ library(tidyverse) We will now apply smooth quantile regression on the real data used for COVID-19 forecasting. The built-in dataset we will use is a subset of JHU daily data on state cases and deaths. This sample data ranges from Dec. 31, 2020 to Dec. 31, 2021. ```{r} -edf <- case_death_rate_subset +edf <- case_death_rate_subset ``` We will set the forecast date to be November 30, 2021 so that we can produce forecasts for target dates of 1 to 28 days ahead. We construct our test data, `tedf` from the days beyond this. @@ -107,37 +107,37 @@ After creating our `epi_workflow` with these components, we get our test data ba We input our forecaster into a function for ease of use. ```{r} -smooth_fc <- function(x, aheads = 1:28, degree = 3L, quantiles = 0.5, fd){ -rec <- epi_recipe(x) %>% - step_epi_lag(case_rate, lag = c(0, 7, 14)) %>% - step_epi_lag(death_rate, lag = c(0, 7, 14)) %>% - step_epi_ahead(death_rate, ahead = aheads) - -f <- frosting() %>% - layer_predict() %>% - layer_unnest(.pred) %>% - layer_naomit(distn) %>% - layer_add_forecast_date() %>% - layer_threshold(distn) - -ee <- smooth_quantile_reg( - quantile_levels = quantiles, - outcome_locations = aheads, - degree = degree -) +smooth_fc <- function(x, aheads = 1:28, degree = 3L, quantiles = 0.5, fd) { + rec <- epi_recipe(x) %>% + step_epi_lag(case_rate, lag = c(0, 7, 14)) %>% + step_epi_lag(death_rate, lag = c(0, 7, 14)) %>% + step_epi_ahead(death_rate, ahead = aheads) + + f <- frosting() %>% + layer_predict() %>% + layer_unnest(.pred) %>% + layer_naomit(distn) %>% + layer_add_forecast_date() %>% + layer_threshold(distn) + + ee <- smooth_quantile_reg( + quantile_levels = quantiles, + outcome_locations = aheads, + degree = degree + ) -ewf <- epi_workflow(rec, ee, f) + ewf <- epi_workflow(rec, ee, f) -the_fit <- ewf %>% fit(x) + the_fit <- ewf %>% fit(x) -latest <- get_test_data(rec, x, fill_locf = TRUE) + latest <- get_test_data(rec, x, fill_locf = TRUE) -preds <- predict(the_fit, new_data = latest) %>% - mutate(forecast_date = fd, target_date = fd + ahead) %>% - select(geo_value, target_date, distn, ahead) %>% - pivot_quantiles_wider(distn) + preds <- predict(the_fit, new_data = latest) %>% + mutate(forecast_date = fd, target_date = fd + ahead) %>% + select(geo_value, target_date, distn, ahead) %>% + pivot_quantiles_wider(distn) -preds + preds } ``` @@ -163,18 +163,19 @@ We can see that we have different columns for the different quantile predictions Let's visualize these results for the sample of two states. We will create a simple plotting function, under which the median predictions are an orange line and the surrounding quantiles are blue bands around this. For comparison, we will include the actual values over time as a black line. ```{r} -plot_preds <- function(preds, geos_to_plot = NULL, train_test_dat, fd){ - +plot_preds <- function(preds, geos_to_plot = NULL, train_test_dat, fd) { if (!is.null(geos_to_plot)) { preds <- preds %>% filter(geo_value %in% geos_to_plot) train_test_dat <- train_test_dat %>% filter(geo_value %in% geos_to_plot) } - + ggplot(preds) + geom_ribbon(aes(target_date, ymin = `0.1`, ymax = `0.9`), - fill = "cornflowerblue", alpha = .8) + + fill = "cornflowerblue", alpha = .8 + ) + geom_ribbon(aes(target_date, ymin = `0.25`, ymax = `0.75`), - fill = "#00488E", alpha = .8) + + fill = "#00488E", alpha = .8 + ) + geom_line(data = train_test_dat, aes(time_value, death_rate), size = 1.5) + geom_line(aes(target_date, `0.5`), color = "orange", size = 1.5) + geom_vline(xintercept = fd) + @@ -200,33 +201,32 @@ We can test the impact of different degrees by using the `map()` function. Let's ```{r, warning = FALSE} smooth_preds_list <- map(1:7, ~ smooth_fc(edf, - degree = .x, - quantiles = c(.1, .25, .5, .75, .9), - fd = fd) %>% - mutate(degree = .x) - -) %>% list_rbind() + degree = .x, + quantiles = c(.1, .25, .5, .75, .9), + fd = fd +) %>% + mutate(degree = .x)) %>% list_rbind() ``` One way to quantify the impact of these on the forecasting is to look at the mean absolute error (MAE) or mean squared error (MSE) over the degrees. We can select the degree that results in the lowest MAE. Since the MAE compares the predicted values to the actual values, we will first join the test data to the predicted data for our comparisons: ```{r, message = FALSE} -tedf_sub <- tedf %>% +tedf_sub <- tedf %>% rename(target_date = time_value, actual = death_rate) %>% select(geo_value, target_date, actual) -smooth_preds_df_deg = smooth_preds_list %>% - left_join(tedf_sub) +smooth_preds_df_deg <- smooth_preds_list %>% + left_join(tedf_sub) ``` And then compute the MAE for each of the degrees: ```{r, message = FALSE} -smooth_preds_df_deg = smooth_preds_list %>% - left_join(tedf_sub) %>% - group_by(degree) %>% +smooth_preds_df_deg <- smooth_preds_list %>% + left_join(tedf_sub) %>% + group_by(degree) %>% mutate(error = abs(`0.5` - actual)) %>% - summarise(mean = mean(error)) + summarise(mean = mean(error)) # Arrange the MAE from smallest to largest head(smooth_preds_df_deg %>% arrange(mean)) @@ -235,7 +235,7 @@ head(smooth_preds_df_deg %>% arrange(mean)) Instead of just looking at the raw numbers, let's create a simple line plot to visualize how the MAE changes over degrees for this data: ```{r} -ggplot(smooth_preds_df_deg, aes(degree, mean)) + +ggplot(smooth_preds_df_deg, aes(degree, mean)) + geom_line() + xlab("Degrees of freedom") + ylab("Mean MAE") @@ -249,7 +249,7 @@ Now, we will briefly compare the results from using smooth quantile regression t To get the basic quantile regression results we can utilize the forecaster that we've already built. We can simply set the degree to be the number of ahead values to re-run the code without smoothing. ```{r, warning = FALSE} -baseline_preds <- smooth_fc(edf, degree = 28L, quantiles = c(.1, .25, .5, .75, .9), fd = fd) +baseline_preds <- smooth_fc(edf, degree = 28L, quantiles = c(.1, .25, .5, .75, .9), fd = fd) ``` And we can produce the corresponding plot to inspect the predictions obtained under the baseline model: @@ -263,23 +263,23 @@ Unlike for smooth quantile regression, the resulting forecasts are not smooth cu For a more formal comparison between the two approaches, we could compare the test performance in terms of accuracy through calculating either the, MAE or MSE, where the performance measure of choice can be calculated over over all times and locations for each ahead value ```{r, message = FALSE} -baseline_preds_df = baseline_preds %>% - left_join(tedf_sub) %>% - group_by(ahead) %>% +baseline_preds_df <- baseline_preds %>% + left_join(tedf_sub) %>% + group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% - summarise(mean = mean(error)) %>% + summarise(mean = mean(error)) %>% mutate(type = "baseline") -smooth_preds_df = smooth_preds %>% - left_join(tedf_sub) %>% - group_by(ahead) %>% +smooth_preds_df <- smooth_preds %>% + left_join(tedf_sub) %>% + group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% - summarise(mean = mean(error)) %>% + summarise(mean = mean(error)) %>% mutate(type = "smooth") preds_df <- bind_rows(baseline_preds_df, smooth_preds_df) -ggplot(preds_df, aes(ahead, mean, color = type)) + +ggplot(preds_df, aes(ahead, mean, color = type)) + geom_line() + xlab("Ahead") + ylab("Mean MAE") From e94b581b3f4eb7d54a1607af3eeffbc28bfc9ee5 Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Mon, 8 Apr 2024 16:18:46 -0700 Subject: [PATCH 04/28] Add packages --- DESCRIPTION | 1 + vignettes/smooth-qr.Rmd | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index 8dbe7ea7f..80e7de094 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -36,6 +36,7 @@ Imports: glue, hardhat (>= 1.3.0), magrittr, + purrr, quantreg, recipes (>= 1.0.4), rlang, diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 6769a8539..89448d2ed 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -52,7 +52,8 @@ The `degree` parameter controls the number of these polynomials used. It should ```{r, message = FALSE} library(epipredict) -library(tidyverse) +library(dplyr) +library(purrr) ``` We will now apply smooth quantile regression on the real data used for COVID-19 forecasting. The built-in dataset we will use is a subset of JHU daily data on state cases and deaths. This sample data ranges from Dec. 31, 2020 to Dec. 31, 2021. From 5507112bb6726cd00b8173bfcbaef3f879b6ee61 Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Mon, 8 Apr 2024 16:32:18 -0700 Subject: [PATCH 05/28] Add ggplot2 --- vignettes/smooth-qr.Rmd | 1 + 1 file changed, 1 insertion(+) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 89448d2ed..be6b80053 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -54,6 +54,7 @@ The `degree` parameter controls the number of these polynomials used. It should library(epipredict) library(dplyr) library(purrr) +library(ggplot2) ``` We will now apply smooth quantile regression on the real data used for COVID-19 forecasting. The built-in dataset we will use is a subset of JHU daily data on state cases and deaths. This sample data ranges from Dec. 31, 2020 to Dec. 31, 2021. From a0c9aded07efc7b441d5e1ebf06684c1c4c1170c Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 20 Apr 2024 20:01:52 -0700 Subject: [PATCH 06/28] Temp addition of some WIS stuff --- vignettes/smooth-qr.Rmd | 42 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index be6b80053..6106e5461 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -247,7 +247,7 @@ We can see that the degree that results in the lowest MAE is 3. Hence, we could ## A brief comparison between smoothing and no smoothing -Now, we will briefly compare the results from using smooth quantile regression to those obtained without smoothing. The latter approach amounts to ordinary quantile regression to get predictions for the intended target date. The main drawback is that it ignores the fact that the responses all represent the same signal, just for different ahead values. In contrast, the smooth quantile regression approach utilizes this information about the data structure - the fact that the aheads in are not be independent of each other, but rather they are naturally related over time by a smooth curve. +Now, we will briefly compare the results from using smooth quantile regression to those obtained without smoothing. The latter approach amounts to ordinary quantile regression to get predictions for the intended target date. The main drawback is that it ignores the fact that the responses all represent the same signal, just for different ahead values. In contrast, the smooth quantile regression approach utilizes this information about the data structure - the fact that the aheads in are not be independent of each other, but that they are naturally related over time by a smooth curve. To get the basic quantile regression results we can utilize the forecaster that we've already built. We can simply set the degree to be the number of ahead values to re-run the code without smoothing. ```{r, warning = FALSE} @@ -296,6 +296,46 @@ mean(smooth_preds_df$mean) The former shows that forecasts for the immediate future and for the distant future are more inaccurate for both models under consideration. This demonstrates the standard conclusion that the farther ahead in time, the more challenging the forecasting. The latter shows that the smooth quantile regression model and baseline models perform very similarly, with the smooth quantile regression model only slightly beating the baseline model in terms of overall MAE. +One other commonly used metric is the WIS score (cite https://arxiv.org/pdf/2005.12881.pdf), which a scoring rule that is based on the population quantiles. The point is to score the interval, whereas MAE only evaluates the accuracy of the point forecast. + +The general formula for the weighted interval is as in [Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723): + + +While it is the measure of choice used in the COVID-19 Forecast Hub, one limitation is that it prioritizes sharpness (how wide the interval is) relative to coverage (if the interval contains the truth). + +We can manually implement this as a function that is compatible with the latest version of `epipredict` (adapted from the WIS function in https://github.com/dajmcdon/smoothmpf-epipredict). + +```{r} +wis_dist_quantile <- function(actual, q, quantile_levels) { + 2 * mean(pmax( + quantile_levels * (actual - q), + (1 - quantile_levels) * (q - actual), na.rm = TRUE + )) +} + +# Test row 1 of test df (test = smooth_preds %>% left_join(tedf_sub)) +wis_dist_quantile(test$actual[2], as.numeric(test[2, 4:8]), as.numeric(names(test)[4:8])) + +test %>% rowwise() %>% mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`), as.numeric(names(test)[4:5]))) + +# Compute mean WIS by ahead (over all states) +test2 = test %>% rowwise() %>% mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`), as.numeric(names(test)[4:5]))) %>% group_by(ahead) %>% summarise(mean = mean(wis)) + +# Plot over all aheads (do the same as above for the baseline and see which has lower WIS) +ggplot(test2, aes(ahead, mean)) + #, color = type)) + + geom_line() + + xlab("Ahead") + + ylab("Mean WIS") + +``` + +From our `smooth_preds_df`, we will need to extract the `values` and the `quantile_levels`: + + +For each state at each target date (can be same forecast date), we will apply the `wis_dist_quantile` function to get a WIS score. We may then use these to compute the mean WIS across all states for an ahead (assuming same forecast date for all state and all of the aheads). + + + # What we've learned in a nutshell Smooth quantile regression is used in multi-period forecasting for predicting several horizons simultaneously with a single smooth curve. It operates under the key assumption that the future of the response can be approximated well by a smooth curve. From 5d95aad0fc46479eed53ee602632dd9e90bf4d2f Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Wed, 24 Apr 2024 05:36:05 -0700 Subject: [PATCH 07/28] Add WIS stuff --- vignettes/smooth-qr.Rmd | 89 +++++++++++++++++++++++++++-------------- 1 file changed, 59 insertions(+), 30 deletions(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 6106e5461..bdef99871 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -155,7 +155,8 @@ head(smooth_preds) Most often, we're not going to want to limit ourselves to just predicting the median value as there is uncertainty about the predictions, so let's try to predict several different quantiles in addition to the median: ```{r, warning = FALSE} -smooth_preds <- smooth_fc(edf, quantiles = c(.1, .25, .5, .75, .9), fd = fd) +several_quantiles <- c(.1, .25, .5, .75, .9) +smooth_preds <- smooth_fc(edf, quantiles = several_quantiles, fd = fd) head(smooth_preds) ``` @@ -251,7 +252,7 @@ Now, we will briefly compare the results from using smooth quantile regression t To get the basic quantile regression results we can utilize the forecaster that we've already built. We can simply set the degree to be the number of ahead values to re-run the code without smoothing. ```{r, warning = FALSE} -baseline_preds <- smooth_fc(edf, degree = 28L, quantiles = c(.1, .25, .5, .75, .9), fd = fd) +baseline_preds <- smooth_fc(edf, degree = 28L, quantiles = several_quantiles, fd = fd) ``` And we can produce the corresponding plot to inspect the predictions obtained under the baseline model: @@ -265,23 +266,24 @@ Unlike for smooth quantile regression, the resulting forecasts are not smooth cu For a more formal comparison between the two approaches, we could compare the test performance in terms of accuracy through calculating either the, MAE or MSE, where the performance measure of choice can be calculated over over all times and locations for each ahead value ```{r, message = FALSE} -baseline_preds_df <- baseline_preds %>% - left_join(tedf_sub) %>% + +baseline_preds_mae_df <- baseline_preds %>% + left_join(tedf_sub) %>% group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% summarise(mean = mean(error)) %>% mutate(type = "baseline") -smooth_preds_df <- smooth_preds %>% - left_join(tedf_sub) %>% +smooth_preds_mae_df <- smooth_preds %>% + left_join(tedf_sub) %>% group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% summarise(mean = mean(error)) %>% mutate(type = "smooth") -preds_df <- bind_rows(baseline_preds_df, smooth_preds_df) +preds_mae_df <- bind_rows(baseline_preds_mae_df, smooth_preds_mae_df) -ggplot(preds_df, aes(ahead, mean, color = type)) + +ggplot(preds_mae_df, aes(ahead, mean, color = type)) + geom_line() + xlab("Ahead") + ylab("Mean MAE") @@ -290,51 +292,78 @@ ggplot(preds_df, aes(ahead, mean, color = type)) + or over all aheads, times, and locations for a single numerical summary. ```{r} -mean(baseline_preds_df$mean) -mean(smooth_preds_df$mean) +mean(baseline_preds_mae_df$mean) +mean(smooth_preds_mae_df$mean) ``` -The former shows that forecasts for the immediate future and for the distant future are more inaccurate for both models under consideration. This demonstrates the standard conclusion that the farther ahead in time, the more challenging the forecasting. The latter shows that the smooth quantile regression model and baseline models perform very similarly, with the smooth quantile regression model only slightly beating the baseline model in terms of overall MAE. +The former shows that forecasts for the immediate future and for the distant future are more inaccurate for both models under consideration. The latter shows that the smooth quantile regression model and baseline models perform very similarly overall, with the smooth quantile regression model only slightly beating the baseline model in terms of overall average MAE. -One other commonly used metric is the WIS score (cite https://arxiv.org/pdf/2005.12881.pdf), which a scoring rule that is based on the population quantiles. The point is to score the interval, whereas MAE only evaluates the accuracy of the point forecast. +One other commonly used metric is the WIS score ([ Bracher et al., 2021](https://arxiv.org/pdf/2005.12881.pdf)), which a scoring rule that is based on the population quantiles. The point is to score the interval, whereas MAE only evaluates the accuracy of the point forecast. -The general formula for the weighted interval is as in [Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723): +Suppose we have $F$ be a forecaster composed of predicted quantiles $q_{\tau}$ for the set of quantile levels $\tau$. Then, in terms of the predicted quantiles, the WIS for target variable $Y$ is represented as follows ([McDonald et al., 2021](https://www.pnas.org/doi/full/10.1073/pnas.2111453118)): +$$ +WIS(F, Y) = 2 \sum_{\tau} \phi_{\tau} (Y - q_{\tau}) +$$ +where $\phi_{\tau}(x) = \tau |x|$ for $x \geq 0$ and$\phi_{\tau}(x) = (1 - \tau) |x|$ for $x < 0$. + +This form is general as it can accommodate both symmetric and asymmetric quantile levels. If the quantile levels are symmetric, then we can alternatively express the WIS as a collection of central prediction intervals ($\ell_{\alpha}, u_{\alpha}$) parametrized by the exclusion probability $\alpha$: + +$$ +WIS(F, Y) = \sum_{\alpha} \{ (u_{\alpha} - \ell_{\alpha}) + 2 \cdot \text{dist}(Y, [\ell_{\alpha}, u_{\alpha}]) \} +$$ +where $\text{dist}(a,S)$ is the smallest distance between point $a$ and an element of set $S$. -While it is the measure of choice used in the COVID-19 Forecast Hub, one limitation is that it prioritizes sharpness (how wide the interval is) relative to coverage (if the interval contains the truth). +While we implement the former representation, we mention this form because it shows the that the score can be decomposed into the addition of a sharpness component (first term in the summand) and an under/overprediction component (second term in the summand). This alternative representation is useful because from it, we more easily see the major limitation to the WIS, which is that the score tends to prioritize sharpness (how wide the interval is) relative to coverage (if the interval contains the truth). -We can manually implement this as a function that is compatible with the latest version of `epipredict` (adapted from the WIS function in https://github.com/dajmcdon/smoothmpf-epipredict). +Now, we write a simple function for the first representation of the score that is compatible with the latest version of `epipredict` (adapted from the corresponding function in [smoothmpf-epipredict](https://github.com/dajmcdon/smoothmpf-epipredict)). The inputs for it are the actual and predicted values and the quantile levels. ```{r} -wis_dist_quantile <- function(actual, q, quantile_levels) { +wis_dist_quantile <- function(actual, values, quantile_levels) { 2 * mean(pmax( - quantile_levels * (actual - q), - (1 - quantile_levels) * (q - actual), na.rm = TRUE + quantile_levels * (actual - values), + (1 - quantile_levels) * (values - actual), na.rm = TRUE )) } +``` -# Test row 1 of test df (test = smooth_preds %>% left_join(tedf_sub)) -wis_dist_quantile(test$actual[2], as.numeric(test[2, 4:8]), as.numeric(names(test)[4:8])) +Next, we apply the `wis_dist_quantile` function to get a WIS score for each state on each target date. We then compute the mean WIS for each ahead value over all of the states. The results for each of the smooth and baseline forecasters are shown in a similar style line plot as we chose for MAE: -test %>% rowwise() %>% mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`), as.numeric(names(test)[4:5]))) +```{r} +smooth_preds_wis_df = smooth_preds %>% + left_join(tedf_sub) %>% + rowwise() %>% + mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), several_quantiles)) %>% + group_by(ahead) %>% + summarise(mean = mean(wis)) %>% + mutate(type = "smooth") + +baseline_preds_wis_df = baseline_preds %>% + left_join(tedf_sub) %>% + rowwise() %>% + mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), several_quantiles)) %>% + group_by(ahead) %>% + summarise(mean = mean(wis)) %>% + mutate(type = "baseline") -# Compute mean WIS by ahead (over all states) -test2 = test %>% rowwise() %>% mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`), as.numeric(names(test)[4:5]))) %>% group_by(ahead) %>% summarise(mean = mean(wis)) +preds_wis_df <- bind_rows(smooth_preds_wis_df, baseline_preds_wis_df) -# Plot over all aheads (do the same as above for the baseline and see which has lower WIS) -ggplot(test2, aes(ahead, mean)) + #, color = type)) + +ggplot(preds_wis_df, aes(ahead, mean, color = type)) + geom_line() + xlab("Ahead") + ylab("Mean WIS") - ``` -From our `smooth_preds_df`, we will need to extract the `values` and the `quantile_levels`: - +The results are consistent with what we saw for MAE: The forecasts for the near and distant future tend to be inaccurate for both models. The smooth quantile regression model only slightly outperforms the baseline model. -For each state at each target date (can be same forecast date), we will apply the `wis_dist_quantile` function to get a WIS score. We may then use these to compute the mean WIS across all states for an ahead (assuming same forecast date for all state and all of the aheads). +Though averaging the WIS score over location and time tends to be the primary aggregation scheme used in evaluation and model comparisons (see, for example, [McDonald et al., 2021](https://www.pnas.org/doi/full/10.1073/pnas.2111453118)), we can also obtain a single numerical summary by averaging over the aheads, times, and locations: +```{r} +mean(baseline_preds_wis_df$mean) +mean(smooth_preds_wis_df$mean) +``` +Overall, both perspectives agree that the smooth quantile regression model tends to perform only slightly better than the baseline model in terms of average WIS, illustrating the difficulty of this forecasting problem. # What we've learned in a nutshell @@ -342,4 +371,4 @@ Smooth quantile regression is used in multi-period forecasting for predicting se # Attribution -The information presented on smooth quantile regression is from [Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723). +The information presented on smooth quantile regression is from [Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723). From ccfd794b629db742a46c253cf3b54bc2befeb884 Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Wed, 24 Apr 2024 06:12:23 -0700 Subject: [PATCH 08/28] Styled --- vignettes/smooth-qr.Rmd | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index bdef99871..a979a60ff 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -266,16 +266,15 @@ Unlike for smooth quantile regression, the resulting forecasts are not smooth cu For a more formal comparison between the two approaches, we could compare the test performance in terms of accuracy through calculating either the, MAE or MSE, where the performance measure of choice can be calculated over over all times and locations for each ahead value ```{r, message = FALSE} - baseline_preds_mae_df <- baseline_preds %>% - left_join(tedf_sub) %>% + left_join(tedf_sub) %>% group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% summarise(mean = mean(error)) %>% mutate(type = "baseline") smooth_preds_mae_df <- smooth_preds %>% - left_join(tedf_sub) %>% + left_join(tedf_sub) %>% group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% summarise(mean = mean(error)) %>% @@ -322,7 +321,8 @@ Now, we write a simple function for the first representation of the score that i wis_dist_quantile <- function(actual, values, quantile_levels) { 2 * mean(pmax( quantile_levels * (actual - values), - (1 - quantile_levels) * (values - actual), na.rm = TRUE + (1 - quantile_levels) * (values - actual), + na.rm = TRUE )) } ``` @@ -330,19 +330,19 @@ wis_dist_quantile <- function(actual, values, quantile_levels) { Next, we apply the `wis_dist_quantile` function to get a WIS score for each state on each target date. We then compute the mean WIS for each ahead value over all of the states. The results for each of the smooth and baseline forecasters are shown in a similar style line plot as we chose for MAE: ```{r} -smooth_preds_wis_df = smooth_preds %>% - left_join(tedf_sub) %>% +smooth_preds_wis_df <- smooth_preds %>% + left_join(tedf_sub) %>% rowwise() %>% - mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), several_quantiles)) %>% - group_by(ahead) %>% + mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), several_quantiles)) %>% + group_by(ahead) %>% summarise(mean = mean(wis)) %>% mutate(type = "smooth") -baseline_preds_wis_df = baseline_preds %>% - left_join(tedf_sub) %>% +baseline_preds_wis_df <- baseline_preds %>% + left_join(tedf_sub) %>% rowwise() %>% - mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), several_quantiles)) %>% - group_by(ahead) %>% + mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), several_quantiles)) %>% + group_by(ahead) %>% summarise(mean = mean(wis)) %>% mutate(type = "baseline") From 4d48f8e99755b0b3ad5e92f8d9a80593f36f6ae4 Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 00:16:48 -0700 Subject: [PATCH 09/28] Several manual changes --- vignettes/smooth-qr.Rmd | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index a979a60ff..bcd332d9f 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -1,5 +1,10 @@ --- -title: "Smooth Quantile Regression" +title: "Smooth quantile regression" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Smooth quantile regression} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} @@ -8,15 +13,15 @@ knitr::opts_chunk$set( comment = "#>", warning = FALSE, message = FALSE, - cache = TRUE + out.width = "100%" ) ``` # Introducing smooth quantile regression -Whereas the standard application of time-series forecasting techniques has been to forecast single horizons, in multi-period forecasting, the goal is to forecast several horizons simultaneously. The standard application of these techniques aims to predict the signal for a single forecast horizon (or "ahead"), most often one-step-ahead. This is useful in epidemiological applications where decisions are often based on the trend of a signal. +Whereas the standard application of time-series forecasting techniques has been to forecast a single horizon, in multi-period forecasting, the goal is to forecast several horizons simultaneously. This is useful in epidemiological applications where decisions are based on the trend of a signal. -The idea underlying smooth quantile regression that the future can be approximated by a smooth curve. So this approach enforces smoothness across the horizons for point estimation by regression or interval prediction by quantile regression. Our focus in this chapter is the latter. +The idea underlying smooth quantile regression is that the future can be approximated by a smooth curve. This novel approach from [Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723) enforces smoothness across the horizons and can be applied to point estimation by regression or interval prediction by quantile regression. Our focus in this vignette is the latter. # Built-in function for smooth quantile regression and its parameters @@ -200,7 +205,7 @@ We can see that the predictions are smooth curves for each state, as expected wh ## Varying the degrees parameter -We can test the impact of different degrees by using the `map()` function. Let's try out all degrees from 1 to 7: +We can test the impact of different degrees by using the `map()` function. Noting that this may take some time to run, let's try out all degrees from 1 to 7: ```{r, warning = FALSE} smooth_preds_list <- map(1:7, ~ smooth_fc(edf, @@ -218,9 +223,6 @@ Since the MAE compares the predicted values to the actual values, we will first tedf_sub <- tedf %>% rename(target_date = time_value, actual = death_rate) %>% select(geo_value, target_date, actual) - -smooth_preds_df_deg <- smooth_preds_list %>% - left_join(tedf_sub) ``` And then compute the MAE for each of the degrees: @@ -285,7 +287,8 @@ preds_mae_df <- bind_rows(baseline_preds_mae_df, smooth_preds_mae_df) ggplot(preds_mae_df, aes(ahead, mean, color = type)) + geom_line() + xlab("Ahead") + - ylab("Mean MAE") + ylab("Mean MAE") + + scale_fill_brewer(palette = "Set1") ``` or over all aheads, times, and locations for a single numerical summary. @@ -351,7 +354,8 @@ preds_wis_df <- bind_rows(smooth_preds_wis_df, baseline_preds_wis_df) ggplot(preds_wis_df, aes(ahead, mean, color = type)) + geom_line() + xlab("Ahead") + - ylab("Mean WIS") + ylab("Mean WIS") + + scale_fill_brewer(palette = "Set1") ``` The results are consistent with what we saw for MAE: The forecasts for the near and distant future tend to be inaccurate for both models. The smooth quantile regression model only slightly outperforms the baseline model. From a7fca205b4e7acbabbcbc1674cec33b08b9dfea0 Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:03:40 -0700 Subject: [PATCH 10/28] purrr to suggests --- DESCRIPTION | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index 80e7de094..a246317f5 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -36,7 +36,6 @@ Imports: glue, hardhat (>= 1.3.0), magrittr, - purrr, quantreg, recipes (>= 1.0.4), rlang, @@ -56,6 +55,7 @@ Suggests: knitr, lubridate, poissonreg, + purrr, ranger, RcppRoll, rmarkdown, From a3113b7d75bea2ff1acbe4719ae1866c921f1127 Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:04:23 -0700 Subject: [PATCH 11/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index bcd332d9f..889be8c6c 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -163,7 +163,7 @@ Most often, we're not going to want to limit ourselves to just predicting the me several_quantiles <- c(.1, .25, .5, .75, .9) smooth_preds <- smooth_fc(edf, quantiles = several_quantiles, fd = fd) -head(smooth_preds) +smooth_preds ``` We can see that we have different columns for the different quantile predictions. From 4bab9f0c16298d32e913486a496500da1c871eea Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:04:44 -0700 Subject: [PATCH 12/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 889be8c6c..91db2d569 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -155,7 +155,7 @@ We now can produce smooth quantile regression predictions for our problem: ```{r, warning = FALSE} smooth_preds <- smooth_fc(edf, fd = fd) -head(smooth_preds) +smooth_preds ``` Most often, we're not going to want to limit ourselves to just predicting the median value as there is uncertainty about the predictions, so let's try to predict several different quantiles in addition to the median: From 63ebc578ff7a084f3f5cac7b6242f35dafaccff4 Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:04:58 -0700 Subject: [PATCH 13/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 91db2d569..a3c1ef36d 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -184,8 +184,8 @@ plot_preds <- function(preds, geos_to_plot = NULL, train_test_dat, fd) { geom_ribbon(aes(target_date, ymin = `0.25`, ymax = `0.75`), fill = "#00488E", alpha = .8 ) + - geom_line(data = train_test_dat, aes(time_value, death_rate), size = 1.5) + - geom_line(aes(target_date, `0.5`), color = "orange", size = 1.5) + + geom_line(data = train_test_dat, aes(time_value, death_rate)) + + geom_line(aes(target_date, `0.5`), color = "orange") + geom_vline(xintercept = fd) + facet_wrap(~geo_value) + theme_bw(16) + From a1b4e2b5b952ff65870b774261529f9cb75c3dbc Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:05:13 -0700 Subject: [PATCH 14/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 1 + 1 file changed, 1 insertion(+) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index a3c1ef36d..dbdfa431d 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -60,6 +60,7 @@ library(epipredict) library(dplyr) library(purrr) library(ggplot2) +theme_set(theme_bw()) ``` We will now apply smooth quantile regression on the real data used for COVID-19 forecasting. The built-in dataset we will use is a subset of JHU daily data on state cases and deaths. This sample data ranges from Dec. 31, 2020 to Dec. 31, 2021. From 3a1d2c7212504a2ac2b945af1b3bd3383f43ace5 Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:05:55 -0700 Subject: [PATCH 15/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index dbdfa431d..9d48dc167 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -301,7 +301,7 @@ mean(smooth_preds_mae_df$mean) The former shows that forecasts for the immediate future and for the distant future are more inaccurate for both models under consideration. The latter shows that the smooth quantile regression model and baseline models perform very similarly overall, with the smooth quantile regression model only slightly beating the baseline model in terms of overall average MAE. -One other commonly used metric is the WIS score ([ Bracher et al., 2021](https://arxiv.org/pdf/2005.12881.pdf)), which a scoring rule that is based on the population quantiles. The point is to score the interval, whereas MAE only evaluates the accuracy of the point forecast. +One other commonly used metric is the Weighted Interval Score (WIS, [Bracher et al., 2021](https://arxiv.org/pdf/2005.12881.pdf)), which a scoring rule that is based on the population quantiles. The point is to score the interval, whereas MAE only evaluates the accuracy of the point forecast. Suppose we have $F$ be a forecaster composed of predicted quantiles $q_{\tau}$ for the set of quantile levels $\tau$. Then, in terms of the predicted quantiles, the WIS for target variable $Y$ is represented as follows ([McDonald et al., 2021](https://www.pnas.org/doi/full/10.1073/pnas.2111453118)): From 1501eeac70b17e240c1ae0dfc8b373acd065a802 Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:06:10 -0700 Subject: [PATCH 16/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 9d48dc167..9e1109f64 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -39,7 +39,7 @@ smooth_quantile_reg( For smooth quantile regression, the type of model or `mode` is regression. -The only `engine` that is currently supported is `smooth_qr()` from the [`smooth_qr` package](https://dajmcdon.github.io/smoothqr/). +The only `engine` that is currently supported is `smooth_qr()` from the [`smoothqr` package](https://dajmcdon.github.io/smoothqr/). The `outcome_locations` indicate the multiple horizon (ie. ahead) values. These should be specified by the user. From 23084ce4f34ca9fe7635514406497bf7d73302b8 Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:06:21 -0700 Subject: [PATCH 17/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 9e1109f64..174293f1b 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -45,7 +45,7 @@ The `outcome_locations` indicate the multiple horizon (ie. ahead) values. These The `quantile_levels` parameter is a vector of values that indicates the quantiles to be estimated. The default is the median (0.5 quantile). -The `degree` parameter indicates the number of polynomials used for smoothing of the response. It should be no more than the number of responses. If the degree is precisely equal to the number of aheads, then there is no smoothing. To better understand this parameter and how it works, we should look to its origins and how it is used in the model. +The `degree` parameter indicates the degree of the polynomials used for smoothing of the response. It should be no more than the number of aheads. If the degree is precisely equal to the number of aheads, then there is no smoothing. To better understand this parameter and how it works, we should look to its origins and how it is used in the model. # Model form From 7645f64c008502b5545a60191113b879dcfd2a3d Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:06:33 -0700 Subject: [PATCH 18/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 1 - 1 file changed, 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 174293f1b..5976cbe2a 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -189,7 +189,6 @@ plot_preds <- function(preds, geos_to_plot = NULL, train_test_dat, fd) { geom_line(aes(target_date, `0.5`), color = "orange") + geom_vline(xintercept = fd) + facet_wrap(~geo_value) + - theme_bw(16) + scale_x_date(name = "", date_labels = "%b %Y") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + ylab("Deaths per 100K inhabitants") From 2c2a56242d5f95f27d2504207223891203238bf6 Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:06:44 -0700 Subject: [PATCH 19/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 5976cbe2a..f93f5e340 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -234,7 +234,7 @@ smooth_preds_df_deg <- smooth_preds_list %>% summarise(mean = mean(error)) # Arrange the MAE from smallest to largest -head(smooth_preds_df_deg %>% arrange(mean)) +smooth_preds_df_deg %>% arrange(mean) ``` Instead of just looking at the raw numbers, let's create a simple line plot to visualize how the MAE changes over degrees for this data: From 9da04fea5182096400330a53797396fdbf2dfc89 Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:06:55 -0700 Subject: [PATCH 20/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index f93f5e340..7afb3db5c 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -302,7 +302,7 @@ The former shows that forecasts for the immediate future and for the distant fut One other commonly used metric is the Weighted Interval Score (WIS, [Bracher et al., 2021](https://arxiv.org/pdf/2005.12881.pdf)), which a scoring rule that is based on the population quantiles. The point is to score the interval, whereas MAE only evaluates the accuracy of the point forecast. -Suppose we have $F$ be a forecaster composed of predicted quantiles $q_{\tau}$ for the set of quantile levels $\tau$. Then, in terms of the predicted quantiles, the WIS for target variable $Y$ is represented as follows ([McDonald et al., 2021](https://www.pnas.org/doi/full/10.1073/pnas.2111453118)): +Let $F$ be a forecast composed of predicted quantiles $q_{\tau}$ for the set of quantile levels $\tau$. Then, in terms of the predicted quantiles, the WIS for target variable $Y$ is represented as follows ([McDonald et al., 2021](https://www.pnas.org/doi/full/10.1073/pnas.2111453118)): $$ WIS(F, Y) = 2 \sum_{\tau} \phi_{\tau} (Y - q_{\tau}) From 33feb879be9a77a54f353331193d281818dab81c Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:07:17 -0700 Subject: [PATCH 21/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 7afb3db5c..19dd2bd03 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -334,7 +334,7 @@ Next, we apply the `wis_dist_quantile` function to get a WIS score for each stat ```{r} smooth_preds_wis_df <- smooth_preds %>% - left_join(tedf_sub) %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) rowwise() %>% mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), several_quantiles)) %>% group_by(ahead) %>% From acaf3db1c2dcb2c8ebff665c1729750114294792 Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:07:27 -0700 Subject: [PATCH 22/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 19dd2bd03..77d45b5a5 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -189,8 +189,7 @@ plot_preds <- function(preds, geos_to_plot = NULL, train_test_dat, fd) { geom_line(aes(target_date, `0.5`), color = "orange") + geom_vline(xintercept = fd) + facet_wrap(~geo_value) + - scale_x_date(name = "", date_labels = "%b %Y") + - theme(axis.text.x = element_text(angle = 45, hjust = 1)) + + scale_x_date(name = "", date_labels = "%b %Y", date_breaks = "2 months") + ylab("Deaths per 100K inhabitants") } ``` From 5fd8a7a2b38aa112aca5b566848b6dc307d25873 Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:07:40 -0700 Subject: [PATCH 23/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index 77d45b5a5..f39746287 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -268,7 +268,7 @@ For a more formal comparison between the two approaches, we could compare the te ```{r, message = FALSE} baseline_preds_mae_df <- baseline_preds %>% - left_join(tedf_sub) %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% summarise(mean = mean(error)) %>% From d1a0be4b75cbdca2180fbf6bd5671bdffafc1567 Mon Sep 17 00:00:00 2001 From: Rachel Lobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:08:45 -0700 Subject: [PATCH 24/28] Update vignettes/smooth-qr.Rmd Co-authored-by: Daniel McDonald --- vignettes/smooth-qr.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index f39746287..d0c49188b 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -275,7 +275,7 @@ baseline_preds_mae_df <- baseline_preds %>% mutate(type = "baseline") smooth_preds_mae_df <- smooth_preds %>% - left_join(tedf_sub) %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% summarise(mean = mean(error)) %>% From 4b7059e1ceaa0e4c1d790cb959187e5c3238c422 Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:11:51 -0700 Subject: [PATCH 25/28] Intro changes & style --- vignettes/smooth-qr.Rmd | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index d0c49188b..eedb1536f 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -19,9 +19,9 @@ knitr::opts_chunk$set( # Introducing smooth quantile regression -Whereas the standard application of time-series forecasting techniques has been to forecast a single horizon, in multi-period forecasting, the goal is to forecast several horizons simultaneously. This is useful in epidemiological applications where decisions are based on the trend of a signal. +Whereas other time-series forecasting examples in this package have used (direct) models for single horizons, in multi-period forecasting, the goal is to (directly) forecast several horizons simultaneously. This is useful in epidemiological applications where decisions are based on the trend of a signal. -The idea underlying smooth quantile regression is that the future can be approximated by a smooth curve. This novel approach from [Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723) enforces smoothness across the horizons and can be applied to point estimation by regression or interval prediction by quantile regression. Our focus in this vignette is the latter. +The idea underlying smooth quantile regression is that set forecast targets can be approximated by a smooth curve. This novel approach from [Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723) enforces smoothness across the horizons and can be applied to point estimation by regression or interval prediction by quantile regression. Our focus in this vignette is the latter. # Built-in function for smooth quantile regression and its parameters @@ -269,14 +269,14 @@ For a more formal comparison between the two approaches, we could compare the te ```{r, message = FALSE} baseline_preds_mae_df <- baseline_preds %>% left_join(tedf_sub, by = c("geo_value", "target_date")) - group_by(ahead) %>% +group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% summarise(mean = mean(error)) %>% mutate(type = "baseline") smooth_preds_mae_df <- smooth_preds %>% left_join(tedf_sub, by = c("geo_value", "target_date")) - group_by(ahead) %>% +group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% summarise(mean = mean(error)) %>% mutate(type = "smooth") @@ -334,7 +334,7 @@ Next, we apply the `wis_dist_quantile` function to get a WIS score for each stat ```{r} smooth_preds_wis_df <- smooth_preds %>% left_join(tedf_sub, by = c("geo_value", "target_date")) - rowwise() %>% +rowwise() %>% mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), several_quantiles)) %>% group_by(ahead) %>% summarise(mean = mean(wis)) %>% From c582250d17b3ea4dee41d4e59a4f41664426fd4a Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:20:41 -0700 Subject: [PATCH 26/28] Catch typos --- vignettes/smooth-qr.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/smooth-qr.Rmd index eedb1536f..da2cd3bab 100644 --- a/vignettes/smooth-qr.Rmd +++ b/vignettes/smooth-qr.Rmd @@ -49,9 +49,9 @@ The `degree` parameter indicates the degree of the polynomials used for smoothin # Model form -Smooth quantile regression is linear auto-regressive, with the key feature being a transformation that forces the coefficients to satisfy a constraint. The purpose if this is for each model coefficient to be a smooth function of ahead values, and so each such coefficient is set to be a linear combination of smooth basis functions (such as a spline or a polynomial). +Smooth quantile regression is linear auto-regressive, with the key feature being a transformation that forces the coefficients to satisfy a smoothing constraint. The purpose of this is for each model coefficient to be a smooth function of ahead values, and so each such coefficient is set to be a linear combination of smooth basis functions (such as a spline or a polynomial). -The `degree` parameter controls the number of these polynomials used. It should be no greater than the number of responses. This is a tuning parameter, and so it can be chosen by performing a grid search with cross-validation. Intuitively, $d = 1$ corresponds to the constant model, $d = 2$ gives straight line forecasts, while $d = 3$ gives quadratic forecasts. Since a degree of 3 was found to work well in the tested applications (see Section 9 of Tuzhilina et al, 2022), it is the default value. +The `degree` parameter controls the number of these polynomials used. It should be no greater than the number of responses. This is a tuning parameter, and so it can be chosen by performing a grid search with cross-validation. Intuitively, $d = 1$ corresponds to the constant model, $d = 2$ gives straight line forecasts, while $d = 3$ gives quadratic forecasts. Since a degree of 3 was found to work well in the tested applications (see Section 9 of [Tuzhilina et al., 2022](https://arxiv.org/abs/2202.09723)), it is the default value. # Demonstration of smooth quantile regression @@ -227,7 +227,7 @@ tedf_sub <- tedf %>% And then compute the MAE for each of the degrees: ```{r, message = FALSE} smooth_preds_df_deg <- smooth_preds_list %>% - left_join(tedf_sub) %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) %>% group_by(degree) %>% mutate(error = abs(`0.5` - actual)) %>% summarise(mean = mean(error)) From d509e768c0f42eabea13f4d9dddd22847968b698 Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 10:21:08 -0700 Subject: [PATCH 27/28] Move to articles --- vignettes/{ => articles}/smooth-qr.Rmd | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename vignettes/{ => articles}/smooth-qr.Rmd (100%) diff --git a/vignettes/smooth-qr.Rmd b/vignettes/articles/smooth-qr.Rmd similarity index 100% rename from vignettes/smooth-qr.Rmd rename to vignettes/articles/smooth-qr.Rmd From b3a33faa52b1ebde55565982a678404a0bb923b6 Mon Sep 17 00:00:00 2001 From: rachlobay <42976509+rachlobay@users.noreply.github.com> Date: Sat, 27 Apr 2024 11:09:57 -0700 Subject: [PATCH 28/28] Fix error and style --- vignettes/articles/smooth-qr.Rmd | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/vignettes/articles/smooth-qr.Rmd b/vignettes/articles/smooth-qr.Rmd index da2cd3bab..a04c3981b 100644 --- a/vignettes/articles/smooth-qr.Rmd +++ b/vignettes/articles/smooth-qr.Rmd @@ -268,15 +268,15 @@ For a more formal comparison between the two approaches, we could compare the te ```{r, message = FALSE} baseline_preds_mae_df <- baseline_preds %>% - left_join(tedf_sub, by = c("geo_value", "target_date")) -group_by(ahead) %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) %>% + group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% summarise(mean = mean(error)) %>% mutate(type = "baseline") smooth_preds_mae_df <- smooth_preds %>% - left_join(tedf_sub, by = c("geo_value", "target_date")) -group_by(ahead) %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) %>% + group_by(ahead) %>% mutate(error = abs(`0.5` - actual)) %>% summarise(mean = mean(error)) %>% mutate(type = "smooth") @@ -333,15 +333,15 @@ Next, we apply the `wis_dist_quantile` function to get a WIS score for each stat ```{r} smooth_preds_wis_df <- smooth_preds %>% - left_join(tedf_sub, by = c("geo_value", "target_date")) -rowwise() %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) %>% + rowwise() %>% mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), several_quantiles)) %>% group_by(ahead) %>% summarise(mean = mean(wis)) %>% mutate(type = "smooth") baseline_preds_wis_df <- baseline_preds %>% - left_join(tedf_sub) %>% + left_join(tedf_sub, by = c("geo_value", "target_date")) %>% rowwise() %>% mutate(wis = wis_dist_quantile(actual, c(`0.1`, `0.25`, `0.5`, `0.75`, `0.9`), several_quantiles)) %>% group_by(ahead) %>%