library(microbenchmark)
library(doFuture)
library(foreach)
library(dplyr)
registerDoFuture()
plan(multisession)Nested dependencies with %:%
I did a lot of nesting checking, but here I’m specifically interested in dependencies in the iterations in the nesting of foreach loops using %:%.
Packages and setup
I’ll use the {future} package, along with {dofuture} and {foreach}.
Built-in nesting with dependencies
I’m getting strange errors when using built-in nesting where the iterations in the inner loop depend on the outer. I think those dependencies aren’t resolving how I assumed they were.
These iterations could be wholly independent of each other, e.g. i = 1:10, j = seq(from = 0, to = 1, by = 0.1). But they could be dependent- e.g. in that simple case, we could naively say j = i/10 because that would make an equivalent vector, but it only happens for each i (see below). That’s expected, but not necessarily obvious at first glance. And it could be more complex still (which is the situation I have), with j indexing into something chosen by i. I’ll go through each in turn, returning objects that allow me to assess what’s up.
Wholly independent
indep_nested <- foreach(i = 1:10, .combine = rbind) %:%
foreach(j = seq(from = 0.1, to = 1, by = 0.1), .combine = rbind) %dopar% {
thisloop <- tibble::tibble(outer_it = i, inner_it = j)
}
# indep_nestedTo check, we can see if there is a factorial mapping
table(indep_nested) inner_it
outer_it 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 1 1
5 1 1 1 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 1 1 1
7 1 1 1 1 1 1 1 1 1 1
8 1 1 1 1 1 1 1 1 1 1
9 1 1 1 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1 1 1 1
# indep_nested |> group_by(outer_it) |> summarise(n_outer = n())
# indep_nested |> group_by(inner_it) |> summarise(n_inner = n())Simple dependency
Now we can make j dependent on i, but very simply. And we see that the factorial combination is lost- j only maps to each i. This is how the loop should work, though it may not be obvious at first glance- the vectors i/10 and seq(from = 0.1, to = 1, by = 0.1) are the same, but the first only finds one value per i, while the second finds the whole vector.
simple_dep <- foreach(i = 1:10, .combine = rbind) %:%
foreach(j = i/10, .combine = rbind) %dopar% {
thisloop <- tibble::tibble(outer_it = i, inner_it = j)
}
# simple_dep
table(simple_dep) inner_it
outer_it 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 1 0 0 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
8 0 0 0 0 0 0 0 1 0 0
9 0 0 0 0 0 0 0 0 1 0
10 0 0 0 0 0 0 0 0 0 1
Balanced indexing
Now we’re on to the bit that is tripping up some of my code. I have a list, and want to index through its names and values. Though I think the same thing would apply to any indexing.
ballist <- list(a = 1:10, b = seq(from = 0.1, to = 1, by = 0.1), d = 11:20)list_dep <- foreach(i = names(ballist), .combine = rbind) %:%
foreach(j = ballist[[i]], .combine = rbind) %dopar% {
thisloop <- tibble::tibble(outer_it = i, inner_it = j)
}
# simple_dep
table(list_dep) inner_it
outer_it 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 11 12 13 14
a 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0
b 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
d 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
inner_it
outer_it 15 16 17 18 19 20
a 0 0 0 0 0 0
b 0 0 0 0 0 0
d 1 1 1 1 1 1
That looks right- there are only records for the inner values when they’re present in the outer. We can see that more clearly in the df itself
list_depUnbalanced indexing
The above should work the same if the list-items are different lengths, but let’s check
unballist <- list(a = 1:5, b = seq(from = 0.1, to = 1, by = 0.1), d = 11:13)list_dep_unbal <- foreach(i = names(unballist), .combine = rbind) %:%
foreach(j = unballist[[i]], .combine = rbind) %dopar% {
thisloop <- tibble::tibble(outer_it = i, inner_it = j)
}list_dep_unbalThat seems like it works how I expect. It’s not so clear then why I’m getting a shuffled issue in the code that prompted this, but it seems I need to look elsewhere.