Skip to content

proof of concept of adding {mori}#1188

Open
EmilHvitfeldt wants to merge 1 commit intomainfrom
mori
Open

proof of concept of adding {mori}#1188
EmilHvitfeldt wants to merge 1 commit intomainfrom
mori

Conversation

@EmilHvitfeldt
Copy link
Copy Markdown
Member

@EmilHvitfeldt EmilHvitfeldt commented Apr 27, 2026

Wanted to try out {mori} to see what would happen and it did not disappoint! https://github.com/shikokuchuo/mori

Before

#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 fit_resamples(wf, folds, control = … 4.38s  4.49s     0.223    6.22MB     0.05

#>   peak RSS: 18.9GB (median)

With mori

#> # A tibble: 1 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 fit_resamples(wf, folds, control = … 2.78s  2.98s     0.328    16.5GB     8.95

#>   peak RSS: 4.23GB (median)

We are even seen substantial speed increases. Randomly normal data, 10 cores, 160mb data

We still need to double check everything. look to make sure everything is wired up, add tests etc etc

Benchmarking code
library(tidymodels)
library(mirai)
library(bench)
library(syrup)

# Setup

set.seed(42)
n <- 1000000
p <- 20
x_mat <- matrix(rnorm(n * p), nrow = n)
colnames(x_mat) <- paste0("x", seq_len(p))
dat <- as.data.frame(x_mat)
dat$y <- dat$x1 + rnorm(n)

lobstr::obj_size(dat)
#> 168.00 MB

wf <- workflow() |>
  add_formula(y ~ .) |>
  add_model(linear_reg() |> set_engine("lm"))

folds <- vfold_cv(dat, v = 10)
ctrl <- control_resamples(allow_par = TRUE)

daemons(10)
on.exit(daemons(0))

results <- bench::mark(
  fit_resamples(wf, folds, control = ctrl),
  iterations = 10,
  check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
results

# Cross-process memory via syrup
# syrup polls RSS of all R processes (main + mirai daemons) every 0.2s,
# capturing peak resident memory across the full process tree.

peak_rss <- function(expr) {
  snap <- syrup(expr, interval = 0.2)
  snap |>
    summarise(total_rss = sum(rss, na.rm = TRUE), .by = time) |>
    summarise(peak = max(total_rss)) |>
    pull(peak)
}

peaks <- replicate(
  5,
  peak_rss(suppressMessages(
    fit_resamples(wf, folds, control = ctrl)
  ))
)

cat(sprintf(
  "  peak RSS: %s (median)\n",
  format(bench::as_bench_bytes(median(peaks)))
))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant