library(doFuture)
library(future.apply)
library(furrr)
library(doRNG)
library(microbenchmark)
registerDoFuture()
plan(multisession)
Foreach globals and speed
I previously tested the impact of unused globals on speed, but only briefly. Here, I’ll be more systematic, because it gets tricky fast if we need to be super careful about what objects exist in the global environment.
There are a couple things to check here
Does unused globals get passed in, just because they exist? Does that slow things down?
Does that answer change if the parallelisation is inside a function?
I’ll tackle these by
Running speed tests before I initialise any globals
Bare processing
Inside a function
Create a big global, and compare two identical processing steps that either ignore it or reference it without doing any processing on it.
Bare
Inside a function
Nothing exists
Well, almost nothing. I’m going to set a couple scalars and define a function for furrr
and future.apply
. I’m not using any of the globals
or export
arguments in the functions.
Bare
= 100
n_reps <- 1000
size
<- function(rep, size) {
fn_to_call <- rnorm(size, mean = rep)
a <- matrix(rnorm(size * size), nrow = size)
b t(a %*% b)
}
Benchmark
microbenchmark(
dofut0 = {foreach(i = 1:n_reps,
.combine = cbind) %dorng% {
<- rnorm(size, mean = i)
a <- matrix(rnorm(size * size), nrow = size)
b t(a %*% b)
}},furr0 = {future_map(1:n_reps, fn_to_call, size = size,
.options = furrr_options(seed = TRUE))},
fuapply0 = {future_lapply(1:n_reps, FUN = fn_to_call, size,
future.seed = TRUE)},
times = 10
)
Unit: seconds
expr min lq mean median uq max neval
dofut0 3.082435 3.167459 3.289073 3.304749 3.409156 3.455932 10
furr0 2.996915 3.189255 3.787848 3.425178 3.551950 7.862844 10
fuapply0 3.127753 3.205452 3.283468 3.256335 3.342933 3.515369 10
So, doFuture and furrr are slower than future.apply, but not by a ton. The key thing here is this sets the baseline, so we can see if things slow down once we have big objects in memory.
Inside a function
These functions are from testing parallel speed, though they have different names here. I’ve added the ability to change the way they handle globals so I don’t have to write new functions for comparing that later, with the default set at the function default.
foreach
<- function(n_reps = 100, size = 1000, .export = NULL, .noexport = NULL) {
foreach_fun <- foreach(i = 1:n_reps,
c_foreach .combine = cbind,
.export = .export,
.noexport = .noexport) %dorng% {
<- rnorm(size, mean = i)
a <- matrix(rnorm(size * size), nrow = size)
b t(a %*% b)
}return(c_foreach)
}
furrrr
<- function(n_reps = 100, size = 1000, globals = TRUE) {
furrr_fun <- function(rep, size) {
fn_to_call <- rnorm(size, mean = rep)
a <- matrix(rnorm(size * size), nrow = size)
b t(a %*% b)
}
<- future_map(1:n_reps, fn_to_call, size = size,
c_map .options = furrr_options(seed = TRUE,
globals = globals))
matrix(unlist(c_map), ncol = n_reps)
}
future.apply
<- function(n_reps = 100, size = 1000, future.globals = TRUE) {
fuapply_fun <- function(rep, size) {
fn_to_call <- rnorm(size, mean = rep)
a <- matrix(rnorm(size * size), nrow = size)
b t(a %*% b)
}
<- future_lapply(1:n_reps, FUN = fn_to_call, size,
c_apply future.seed = TRUE,
future.globals = future.globals)
matrix(unlist(c_apply), ncol = n_reps)
}
Benchmark
microbenchmark(
dofut_fun = foreach_fun(n_reps = 100, size = 1000),
fur_fun = furrr_fun(n_reps = 100, size = 1000),
app_fun = fuapply_fun(n_reps = 100, size = 1000),
times = 10
)
Unit: seconds
expr min lq mean median uq max neval
dofut_fun 2.894709 2.943020 3.032240 3.025625 3.048341 3.243421 10
fur_fun 2.907043 2.959023 3.009576 2.999129 3.035445 3.159187 10
app_fun 2.874484 3.003329 3.057768 3.066695 3.118491 3.288474 10
This sets the other baseline before we have big objects in memory, so we can see if things respond differently when used inside a function’s environment vs directly in the global. Now all three functions are basically equivalent.
With big global
Default future.globals.maxsize
is 500MB. Should i increase that, or just try to hit it? I think just try to get just under it.
# This is 1.6GB
# big_obj <- matrix(rnorm(20000*10000), nrow = 10000)
# 496 MB
<- matrix(rnorm(10000*6200), nrow = 10000) big_obj
Now, same tests as before, and some that reference it but don’t use it.
The comparisons to make here are:
Matched to above- does just having the object exist slow things down, even if not called?
Referenced and not- does it only get passed in if asked for and slow things down?
- Not exactly sure how I’ll check that. Maybe instead of referencing it in the function (which is hard to do without using it, especially with furrr and future.apply), I’ll explicitly send it in with their globals arguments.
Bare
Benchmark
I’m going to run this for default (no global argument), explicitly sending them in, and explicitly excluding them.
microbenchmark(
# default- same as above, but now big_obj exists, but is not used in the actual processing
dofut0 = {foreach(i = 1:n_reps,
.combine = cbind) %dorng% {
<- rnorm(size, mean = i)
a <- matrix(rnorm(size * size), nrow = size)
b t(a %*% b)
}},furr0 = {future_map(1:n_reps, fn_to_call, size = size,
.options = furrr_options(seed = TRUE))},
fuapply0 = {future_lapply(1:n_reps, FUN = fn_to_call, size,
future.seed = TRUE)},
# Explicitly telling it not to send big global (I can't sort out getting .export to work)
dofut_no_g = {foreach(i = 1:n_reps,
.combine = cbind,
.noexport = "big_obj") %dorng% {
<- rnorm(size, mean = i)
a <- matrix(rnorm(size * size), nrow = size)
b t(a %*% b)
}},
furr_no_g = {future_map(1:n_reps, fn_to_call, size = size,
.options = furrr_options(seed = TRUE,
globals = FALSE))},
fuapply_no_g = {future_lapply(1:n_reps, FUN = fn_to_call, size,
future.seed = TRUE,
future.globals = FALSE)},
# Explicitly telling it to send the unused global
dofut_g = {foreach(i = 1:n_reps,
.combine = cbind,
.export = 'big_obj') %dorng% {
<- rnorm(size, mean = i)
a <- matrix(rnorm(size * size), nrow = size)
b t(a %*% b)
}},
furr_g = {future_map(1:n_reps, fn_to_call, size = size,
.options = furrr_options(seed = TRUE,
globals = 'big_obj'))},
fuapply_g = {future_lapply(1:n_reps, FUN = fn_to_call, size,
future.seed = TRUE,
future.globals = 'big_obj')},
times = 10
)
Unit: seconds
expr min lq mean median uq max neval
dofut0 2.922838 3.000344 3.249182 3.182809 3.262618 4.264726 10
furr0 2.949936 2.981717 3.102662 3.052143 3.217054 3.398791 10
fuapply0 2.992581 3.023170 3.220997 3.148239 3.436145 3.624800 10
dofut_no_g 2.945903 3.066295 3.140082 3.118970 3.276743 3.335264 10
furr_no_g 2.874103 3.076374 3.204159 3.253832 3.306481 3.568914 10
fuapply_no_g 2.912136 3.057193 3.108612 3.094443 3.177574 3.341436 10
dofut_g 13.140054 13.625128 14.188250 14.119873 14.878983 15.087787 10
furr_g 13.786607 14.154656 14.397246 14.394677 14.609236 15.120794 10
fuapply_g 13.455395 13.675143 14.085484 13.918776 14.251080 15.601045 10
Now there’s a big object sitting in global memory, but it does not slow down the default run relative to the enforced-non-pass version or the version from before it existed (above). It does show major slowdown when it is explicitly passed.
Unused globals therefore are NOT passed by default, even when code is running straight in the global environment.
Inside functions
The functions have an option to change the way globals are handled.
Benchmark
microbenchmark(
# default
dofut_default = foreach_fun(n_reps = 100, size = 1000),
fur_default = furrr_fun(n_reps = 100, size = 1000),
app_default = fuapply_fun(n_reps = 100, size = 1000),
# No globals
dofut_no_g = foreach_fun(n_reps = 100, size = 1000, .noexport = 'big_obj'),
fur_no_g = furrr_fun(n_reps = 100, size = 1000,
globals = FALSE),
app_no_g = fuapply_fun(n_reps = 100, size = 1000,
future.globals = FALSE),
# Explicit globals
dofut_g = foreach_fun(n_reps = 100, size = 1000,
.export = 'big_obj'),
fur_g = furrr_fun(n_reps = 100, size = 1000,
globals = 'big_obj'),
app_g = fuapply_fun(n_reps = 100, size = 1000,
future.globals = 'big_obj'),
times = 10
)
Unit: seconds
expr min lq mean median uq max
dofut_default 2.820070 3.133994 3.418282 3.494786 3.643098 4.122530
fur_default 2.955963 3.170553 3.401131 3.247953 3.674548 4.264421
app_default 2.777021 2.957015 3.264931 3.286259 3.545510 3.748255
dofut_no_g 2.741006 3.076965 3.271767 3.285788 3.454271 3.721240
fur_no_g 2.961480 3.051561 3.413412 3.445728 3.553680 4.109865
app_no_g 2.835050 3.020232 3.264410 3.235451 3.499824 3.772277
dofut_g 13.666079 14.104627 15.448594 15.349449 15.978492 18.617738
fur_g 13.761059 15.330178 15.521168 15.707339 16.317537 16.369009
app_g 12.988882 14.554517 15.117242 15.415819 15.672232 16.115645
neval
10
10
10
10
10
10
10
10
10
Using functions yields the same result as before- the big objects sitting in the global environment do not get passed in and slow things down if they aren’t actually used in the functions (or explicitly sent in).
Unused globals therefore are NOT passed by default into parallelised functions.