<- function(x) {
err_even_warn5 if ((x %% 2) == 0) {
stop('Even numbers are error')
else if (x == 5) {
} warning('5 throws a warning')
else {x}
} }
Errors with map
Sometimes when we use purrr::map
or similar functions, one of the iterations hits an error. When this happens, we lose the whole set of runs, even if the others would run or have already run. That can be a waste of time, make it hard to find the issue, and prevent re-running just the failed bits if the error is intermittent (e.g. many HTTP errors).
Using a test function from error handling that errors on even numbers, warns if the number is 5, and otherwise returns the input.
The issue here is that if we try to run that, not only does it error for 5, we don’t get any of the results
::map(1:10, err_even_warn5) purrr
Error in `purrr::map()`:
ℹ In index: 2.
Caused by error in `.f()`:
! Even numbers are error
But what if we want to get all the other results, and possibly identify the failures and correct them or retry?
One option is to use purrr::safely
, as it returns a list with a result
and error
item. This means that purrring over things where some may fail doesn’t kill everything, but we need to unpack it a bit.
The syntax is typical map
, but with the function to apply wrapped in the ‘adverb’ safely
.
<- purrr::map(1:10,
errpurr ::safely(err_even_warn5)) purrr
Warning in .f(...): 5 throws a warning
errpurr
[[1]]
[[1]]$result
[1] 1
[[1]]$error
NULL
[[2]]
[[2]]$result
NULL
[[2]]$error
<simpleError in .f(...): Even numbers are error>
[[3]]
[[3]]$result
[1] 3
[[3]]$error
NULL
[[4]]
[[4]]$result
NULL
[[4]]$error
<simpleError in .f(...): Even numbers are error>
[[5]]
[[5]]$result
[1] "5 throws a warning"
[[5]]$error
NULL
[[6]]
[[6]]$result
NULL
[[6]]$error
<simpleError in .f(...): Even numbers are error>
[[7]]
[[7]]$result
[1] 7
[[7]]$error
NULL
[[8]]
[[8]]$result
NULL
[[8]]$error
<simpleError in .f(...): Even numbers are error>
[[9]]
[[9]]$result
[1] 9
[[9]]$error
NULL
[[10]]
[[10]]$result
NULL
[[10]]$error
<simpleError in .f(...): Even numbers are error>
Note that safely
only deals with errors, the ‘warning’ at index 5 just passes through and is included in the result
. We could use quietly
instead if we want to capture all possibilities except errors, which still cause quietly
to fail. We can do things like look for the values with or without errors
<- purrr::map(errpurr,
whicherrors !is.null(x$error)) |>
\(x) unlist() |>
which()
whicherrors
[1] 2 4 6 8 10
Those without errors (or with a non-null result
)can be used to extract the clean outputs. Note that this includes the warning.
<- purrr::map(errpurr,
noterrors ::pluck(x, 'result'))
\(x) purrr
noterrors
[[1]]
[1] 1
[[2]]
NULL
[[3]]
[1] 3
[[4]]
NULL
[[5]]
[1] "5 throws a warning"
[[6]]
NULL
[[7]]
[1] 7
[[8]]
NULL
[[9]]
[1] 9
[[10]]
NULL
Another option is to use list_transpose
and then get the result and error lists. Two plucks is likely better, especially if we usually only need one.
<- purrr::list_transpose(errpurr)
terr
$result terr
[[1]]
[1] 1
[[2]]
NULL
[[3]]
[1] 3
[[4]]
NULL
[[5]]
[1] "5 throws a warning"
[[6]]
NULL
[[7]]
[1] 7
[[8]]
NULL
[[9]]
[1] 9
[[10]]
NULL
$error terr
[[1]]
NULL
[[2]]
<simpleError in .f(...): Even numbers are error>
[[3]]
NULL
[[4]]
<simpleError in .f(...): Even numbers are error>
[[5]]
NULL
[[6]]
<simpleError in .f(...): Even numbers are error>
[[7]]
NULL
[[8]]
<simpleError in .f(...): Even numbers are error>
[[9]]
NULL
[[10]]
<simpleError in .f(...): Even numbers are error>
The use of safely
above is really handy if we want to read the errors. If not, and we just want to save the non-errors, possibly
with a default is likely better (cleaner).
<- purrr::map(1:10,
errpurrP ::possibly(err_even_warn5,
purrrNA))
Warning in .f(...): 5 throws a warning
errpurrP
[[1]]
[1] 1
[[2]]
[1] NA
[[3]]
[1] 3
[[4]]
[1] NA
[[5]]
[1] "5 throws a warning"
[[6]]
[1] NA
[[7]]
[1] 7
[[8]]
[1] NA
[[9]]
[1] 9
[[10]]
[1] NA
Note that in use we’d likely still want to have a cleanup step/function to chuck out the warnings before concatenating the rest.
There’s also a question of what happens if an iteration has a warning and a result. For example
<- function(x) {
err_even_warn510 if ((x %% 2) == 0) {
stop('Even numbers are error')
else if (x == 5) {
} warning('5 doubles')
<- 10
x else {x}
} return(x)
}
For both safely
and possibly
, a real result plus warning ends up with the real result in the output and the warning bubbling up. So that’s good- warnings don’t change the structure of the data if there is data.
<- purrr::map(1:10,
ep5 ::safely(err_even_warn510)) purrr
Warning in .f(...): 5 doubles
ep5
[[1]]
[[1]]$result
[1] 1
[[1]]$error
NULL
[[2]]
[[2]]$result
NULL
[[2]]$error
<simpleError in .f(...): Even numbers are error>
[[3]]
[[3]]$result
[1] 3
[[3]]$error
NULL
[[4]]
[[4]]$result
NULL
[[4]]$error
<simpleError in .f(...): Even numbers are error>
[[5]]
[[5]]$result
[1] 10
[[5]]$error
NULL
[[6]]
[[6]]$result
NULL
[[6]]$error
<simpleError in .f(...): Even numbers are error>
[[7]]
[[7]]$result
[1] 7
[[7]]$error
NULL
[[8]]
[[8]]$result
NULL
[[8]]$error
<simpleError in .f(...): Even numbers are error>
[[9]]
[[9]]$result
[1] 9
[[9]]$error
NULL
[[10]]
[[10]]$result
NULL
[[10]]$error
<simpleError in .f(...): Even numbers are error>
<- purrr::map(1:10,
epP ::possibly(err_even_warn510,
purrrNA))
Warning in .f(...): 5 doubles
epP
[[1]]
[1] 1
[[2]]
[1] NA
[[3]]
[1] 3
[[4]]
[1] NA
[[5]]
[1] 10
[[6]]
[1] NA
[[7]]
[1] 7
[[8]]
[1] NA
[[9]]
[1] 9
[[10]]
[1] NA
Programmatic use
I have several cases where I want to run map
in packages or large analyses, and assess what’s happening to the fails and possibly re-run. That needs a few wrappers or standard sequences of steps around what I have above. The standard steps might be better, then we don’t have to deal with function passing, which is a hassle.
Let’s set up a function that will fail about half the time, but re-runs might work.
<- function(x) {
failhalf if (runif(1) <= 0.5) {
<- x+5
x else {
} stop("random above 0.5")
}return(x)
}
I’ll work with safely
- I think that’s more general than possibly
, and gives a developer the ability to go in and look at the errors in debug, even if they’re not returned. Let’s assume we have a variable to feed it, as we usually would in a function.
<- 1:10
larg
# first run
<- purrr::map(larg, purrr::safely(failhalf))
x5 # get the results
<- purrr::map(x5, purrr::pluck('result'))
r5
# Get the errors- we might have this somewhere a dev could get it, but not always use it.
<- purrr::map(x5, purrr::pluck('error')) e5
If we don’t want to retry, that’s as far as we need to go. We could easily return a list just like what would be returned normally and another list of the errors. That’s as simple as
<- function(input, fun) {
safepurr # first run
<- purrr::map(input, purrr::safely(fun))
x5 # get the results
<- purrr::map(x5, purrr::pluck('result'))
r5
# Get the errors- we might have this somewhere a dev could get it, but not always use it.
<- purrr::map(x5, purrr::pluck('error'))
e5
return(list(r5, e5))
}
Though I’m not sure what the point is. By the time we unpack that we might as well have just done it inline with pluck().
Retries
If we do want to retry, we need to re-run the failures. This actually makes sense to do in a single while
, rather than with the above first. We can put a retries
argument in easily enough.
# if we have e5, we could use it, but it's not any harder to get error indices directly
<- 1:10
larg <- 1:length(larg)
whicherrors <- vector(mode = 'list', length = 0)
lout while(length(whicherrors) > 0) {
<- larg[whicherrors]
larg # first run
<- purrr::map(larg, purrr::safely(failhalf))
x5 # get the results, dropping the NULLs
<- purrr::map(x5, purrr::pluck('result'))
r5
# where are the errors
<- purrr::map(x5,
whicherrors ::is_error(
\(x) rlang$error)
x|>
) unlist() |>
which()
# append
<- c(lout, r5[-whicherrors])
lout
}
That works fine if we don’t care about order, but if we do, we’ll need to make sure we know which list items are erroring and replace them. That will almost always be what we want to do, and isn’t any more complicated.
<- 1:10
larg <- 1:length(larg)
whicherrors <- vector(mode = 'list', length = length(larg))
lout # the indices, to track which are being filled/left
<- 1:10
indlist while(length(whicherrors) > 0) {
<- larg[whicherrors]
larg # first run
<- purrr::map(larg, purrr::safely(failhalf))
x5 # get the results, dropping the NULLs
<- purrr::map(x5, purrr::pluck('result'))
r5
# replace the indices that were errors with new data. Some might still be errors, they will fill subsequently
<- r5
lout[indlist]
# where are the errors
<- purrr::map(x5,
whicherrors ::is_error(
\(x) rlang$error)
x|>
) unlist() |>
which()
# which ORIGINAL indices are we left with?
<- indlist[whicherrors]
indlist
}
Rather than a while
, can we recurse? Yes, and it’s a bit cleaner. But, it’s not tail-recursive and there’s no obvious way to set a retries.
<- function(larg) {
getsafe <- purrr::map(larg, purrr::safely(failhalf))
x5 # get the results, dropping the NULLs
<- purrr::map(x5, purrr::pluck('result'))
r5
<- purrr::map(x5,
whicherrors ::is_error(
\(x) rlang$error)
x|>
) unlist() |>
which()
if (length(whicherrors > 0)) {
<- getsafe(larg[whicherrors])
eout <- eout
r5[whicherrors]
}
return(r5)
}
getsafe(1:10)
[[1]]
[1] 6
[[2]]
[1] 7
[[3]]
[1] 8
[[4]]
[1] 9
[[5]]
[1] 10
[[6]]
[1] 11
[[7]]
[1] 12
[[8]]
[1] 13
[[9]]
[1] 14
[[10]]
[1] 15
How bad is it to make a function that takes the input and the function and does the while loop?
<- function(input, fun, retries) {
safe_clean_retries <- 1:length(input)
whicherrors <- vector(mode = 'list', length = length(input))
lout # the indices, to track which are being filled/left
<- 1:length(input)
indlist = 0
counter
while (length(whicherrors) > 0 & counter <= retries) {
# run the purrr
<- purrr::map(input, purrr::safely(fun))
x5 # get the results, dropping the NULLs
<- purrr::map(x5, purrr::pluck('result'))
r5
# if we want the errors, we could put in a debug here
<- purrr::map(x5, purrr::pluck('result'))
e5
# replace the indices that were errors with new data. Some might still be errors, they will fill subsequently
<- r5
lout[indlist]
# where are the errors
<- purrr::map(x5,
whicherrors ::is_error(x$error)) |>
\(x) rlangunlist() |>
which()
# Cut the data to the fails
<- input[whicherrors]
input
# which ORIGINAL indices are we left with?
<- indlist[whicherrors]
indlist
<- counter + 1
counter
}
return(lout)
}
And that lets us use it
safe_clean_retries(1:10, failhalf, retries = 5) |>
unlist()
[1] 6 7 8 9 10 11 12 13 14 15
It should work to pass it anonymous functions or otherwise custom?
safe_clean_retries(1:10,
ifelse(sample(c(1,2), 1) == 1,
\(x) stop(), x),
retries = 10) |>
unlist()
[1] 1 2 3 4 5 6 7 8 9 10
It works for furrr
too, though in this case it’s slower (not surprising, for this test case the overhead will be much bigger than the computation).
<- function(input, fun, retries) {
safe_clean_retries_f <- 1:length(input)
whicherrors <- vector(mode = 'list', length = length(input))
lout # the indices, to track which are being filled/left
<- 1:length(input)
indlist = 0
counter
while (length(whicherrors) > 0 & counter <= retries) {
# run the purrr
# Only parallel this one. The others are just indexing
<- furrr::future_map(input, purrr::safely(fun), .options = furrr_options(seed = TRUE))
x5 # get the results, dropping the NULLs
<- purrr::map(x5, purrr::pluck('result'))
r5
# if we want the errors, we could put in a debug here
<- purrr::map(x5, purrr::pluck('result'))
e5
# replace the indices that were errors with new data. Some might still be errors, they will fill subsequently
<- r5
lout[indlist]
# where are the errors
<- purrr::map(x5,
whicherrors ::is_error(x$error)) |>
\(x) rlangunlist() |>
which()
# Cut the data to the fails
<- input[whicherrors]
input
# which ORIGINAL indices are we left with?
<- indlist[whicherrors]
indlist
<- counter + 1
counter
}
return(lout)
}
library(furrr)
Loading required package: future
plan(multisession)
safe_clean_retries_f(1:10,
ifelse(sample(c(1,2), 1) == 1,
\(x) stop(), x),
retries = 10) |>
unlist()
[1] 1 2 3 4 5 6 7 8 9 10
And finally, we can clean that up to use the same arg names as purrr
and do both parallel or not
<- function(.x, .f, ..., retries = 10, parallel = FALSE) {
safe_map <- 1:length(.x)
whicherrors <- vector(mode = 'list', length = length(.x))
result_list # the indices, to track which are being filled/left
<- 1:length(.x)
orig_indices = 0
counter
while (length(whicherrors) > 0 & counter <= retries) {
# run the purrr
# Only parallel this one. The others are just indexing
if (parallel) {
<- furrr::future_map(.x, purrr::safely(.f),
full_out .options = furrr_options(seed = TRUE))
else {
} <- purrr::map(.x, purrr::safely(.))
full_out
}# get the results, dropping the NULLs
<- purrr::map(full_out, purrr::pluck('result'))
intermed_result
# if we want the errors, we could put in a debug here
<- purrr::map(full_out, purrr::pluck('result'))
err_list
# replace the indices that were errors with new data. Some might still be errors, they will fill subsequently
<- intermed_result
result_list[orig_indices]
# where are the errors
<- purrr::map(full_out,
whicherrors ::is_error(x$error)) |>
\(x) rlangunlist() |>
which()
# Cut the data to the fails
<- .x[whicherrors]
.x
# which ORIGINAL indices are we left with?
<- orig_indices[whicherrors]
orig_indices
<- counter + 1
counter
}
return(result_list)
}
Benchmarking
The speed question is interesting- how much does it slow things down to run in this wrapper? Should I put everything in it, or is the speed hit only worth it where there’s a high likelihood of failure and each iteration is big?
Let’s set something a bit bigger up and test. Just purrrr, assume furrr will scale similarly. I’m not going to have any errors- the point here is to ask how much this hurts when there aren’t errors. And if that tradeoff is worth the ability to fix others.
<- list(iris, mtcars, iris, mtcars, iris, mtcars)
inlist
<- function(x) {
testfun <- x |>
x ::mutate(across(where(is.numeric), mean)) |>
dplyr::summarise(across(where(is.numeric), sum))
dplyr
return(x)
}
The hit there isn’t too bad. Seems like it’s probably usually worth it, especially for big computations. For big jobs, the consequences of errors will be worse in terms of lost time/results, and the additional overhead will be a smaller proportion of the time compared to the main purrr
call.
::microbenchmark(
microbenchmarkbarepurrr = purrr::map(inlist, testfun),
safepurrr = safe_clean_retries(inlist, testfun, retries = 10),
times = 100
)
Unit: milliseconds
expr min lq mean median uq max neval
barepurrr 19.8354 22.37945 25.35412 23.1743 25.61785 159.4209 100
safepurrr 21.1953 22.90615 25.38399 24.4367 25.77285 88.9560 100
Function construction
The functions above all just use a function with a single unspecified argument. But things get trickier with anonymous functions or multiple arguments. The locations for the arguments aren’t always intuitive- they go after the possibly(function())
. The reason is because possibly
and safely
both create new functions.
For example, if we have a simple function, still just with one argument
<- function(x) {
add5 +5
x }
Then the simple version works
::map(1:5, purrr::safely(add5)) purrr
[[1]]
[[1]]$result
[1] 6
[[1]]$error
NULL
[[2]]
[[2]]$result
[1] 7
[[2]]$error
NULL
[[3]]
[[3]]$result
[1] 8
[[3]]$error
NULL
[[4]]
[[4]]$result
[1] 9
[[4]]$error
NULL
[[5]]
[[5]]$result
[1] 10
[[5]]$error
NULL
If we want to be more specific and make it anonymous, though, where does the x go? What is the safe equivalent of this?
::map(1:5, \(x) add5(x)) purrr
[[1]]
[1] 6
[[2]]
[1] 7
[[3]]
[1] 8
[[4]]
[1] 9
[[5]]
[1] 10
This works. The anonymous function is wholly inside safely, and so the whole anonymous function gets transformed into a safe version.
::map(1:5, purrr::safely(\(x) add5(x))) purrr
[[1]]
[[1]]$result
[1] 6
[[1]]$error
NULL
[[2]]
[[2]]$result
[1] 7
[[2]]$error
NULL
[[3]]
[[3]]$result
[1] 8
[[3]]$error
NULL
[[4]]
[[4]]$result
[1] 9
[[4]]$error
NULL
[[5]]
[[5]]$result
[1] 10
[[5]]$error
NULL
This does not. The safely can’t be inside the anonymous function
::map(1:5, \(x) purrr::safely(add5(x))) purrr
[[1]]
function (...)
capture_error(.f(...), otherwise, quiet)
<bytecode: 0x000001c0b17027d8>
<environment: 0x000001c0b56976d0>
[[2]]
function (...)
capture_error(.f(...), otherwise, quiet)
<bytecode: 0x000001c0b17027d8>
<environment: 0x000001c0b579a518>
[[3]]
function (...)
capture_error(.f(...), otherwise, quiet)
<bytecode: 0x000001c0b17027d8>
<environment: 0x000001c0b5797040>
[[4]]
function (...)
capture_error(.f(...), otherwise, quiet)
<bytecode: 0x000001c0b17027d8>
<environment: 0x000001c0b57a35b8>
[[5]]
function (...)
capture_error(.f(...), otherwise, quiet)
<bytecode: 0x000001c0b17027d8>
<environment: 0x000001c0b57a41d0>
But this does- safely(fun)
is a function, and so we can give it the argument.
::map(1:5, \(x) purrr::safely(add5)(x)) purrr
[[1]]
[[1]]$result
[1] 6
[[1]]$error
NULL
[[2]]
[[2]]$result
[1] 7
[[2]]$error
NULL
[[3]]
[[3]]$result
[1] 8
[[3]]$error
NULL
[[4]]
[[4]]$result
[1] 9
[[4]]$error
NULL
[[5]]
[[5]]$result
[1] 10
[[5]]$error
NULL
This can be useful with multiple arguments, e.g.
<- function(x,y) {
adder + y
x }
Again, as anonymous, wholly inside works
::map(1:5, purrr::safely(\(x) adder(x, 10))) purrr
[[1]]
[[1]]$result
[1] 11
[[1]]$error
NULL
[[2]]
[[2]]$result
[1] 12
[[2]]$error
NULL
[[3]]
[[3]]$result
[1] 13
[[3]]$error
NULL
[[4]]
[[4]]$result
[1] 14
[[4]]$error
NULL
[[5]]
[[5]]$result
[1] 15
[[5]]$error
NULL
It does not work if it’s not anonymous, ie just giving it the second argument. While this syntax works normally,
::map(1:5, adder, 10) purrr
[[1]]
[1] 11
[[2]]
[1] 12
[[3]]
[1] 13
[[4]]
[1] 14
[[5]]
[1] 15
Similar does not work with safely.
::map(1:5, purrr::safely(adder, 10)) purrr
[[1]]
[[1]]$result
[1] 10
[[1]]$error
<simpleError in .f(...): argument "y" is missing, with no default>
[[2]]
[[2]]$result
[1] 10
[[2]]$error
<simpleError in .f(...): argument "y" is missing, with no default>
[[3]]
[[3]]$result
[1] 10
[[3]]$error
<simpleError in .f(...): argument "y" is missing, with no default>
[[4]]
[[4]]$result
[1] 10
[[4]]$error
<simpleError in .f(...): argument "y" is missing, with no default>
[[5]]
[[5]]$result
[1] 10
[[5]]$error
<simpleError in .f(...): argument "y" is missing, with no default>
::map(1:5, purrr::safely(adder(10))) purrr
Error in adder(10): argument "y" is missing, with no default
To get this to work with safely, we have to anonymize, but being careful to feed the arguments after the final safely
parenthesis.
::map(1:5, \(x) purrr::safely(adder)(x, y=10)) purrr
[[1]]
[[1]]$result
[1] 11
[[1]]$error
NULL
[[2]]
[[2]]$result
[1] 12
[[2]]$error
NULL
[[3]]
[[3]]$result
[1] 13
[[3]]$error
NULL
[[4]]
[[4]]$result
[1] 14
[[4]]$error
NULL
[[5]]
[[5]]$result
[1] 15
[[5]]$error
NULL