library(microbenchmark)
library(doFuture)
library(foreach)
library(dplyr)
registerDoFuture()
plan(multisession)
Nested dependencies with %:%
I did a lot of nesting checking, but here I’m specifically interested in dependencies in the iterations in the nesting of foreach loops using %:%
.
Packages and setup
I’ll use the {future} package, along with {dofuture} and {foreach}.
Built-in nesting with dependencies
I’m getting strange errors when using built-in nesting where the iterations in the inner loop depend on the outer. I think those dependencies aren’t resolving how I assumed they were.
These iterations could be wholly independent of each other, e.g. i = 1:10
, j = seq(from = 0, to = 1, by = 0.1)
. But they could be dependent- e.g. in that simple case, we could naively say j = i/10
because that would make an equivalent vector, but it only happens for each i
(see below). That’s expected, but not necessarily obvious at first glance. And it could be more complex still (which is the situation I have), with j
indexing into something chosen by i
. I’ll go through each in turn, returning objects that allow me to assess what’s up.
Wholly independent
<- foreach(i = 1:10, .combine = rbind) %:%
indep_nested foreach(j = seq(from = 0.1, to = 1, by = 0.1), .combine = rbind) %dopar% {
<- tibble::tibble(outer_it = i, inner_it = j)
thisloop
}# indep_nested
To check, we can see if there is a factorial mapping
table(indep_nested)
inner_it
outer_it 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 1 1
5 1 1 1 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 1 1 1
7 1 1 1 1 1 1 1 1 1 1
8 1 1 1 1 1 1 1 1 1 1
9 1 1 1 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1 1 1 1
# indep_nested |> group_by(outer_it) |> summarise(n_outer = n())
# indep_nested |> group_by(inner_it) |> summarise(n_inner = n())
Simple dependency
Now we can make j
dependent on i
, but very simply. And we see that the factorial combination is lost- j
only maps to each i
. This is how the loop should work, though it may not be obvious at first glance- the vectors i/10
and seq(from = 0.1, to = 1, by = 0.1)
are the same, but the first only finds one value per i
, while the second finds the whole vector.
<- foreach(i = 1:10, .combine = rbind) %:%
simple_dep foreach(j = i/10, .combine = rbind) %dopar% {
<- tibble::tibble(outer_it = i, inner_it = j)
thisloop
}# simple_dep
table(simple_dep)
inner_it
outer_it 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1 1 0 0 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0
5 0 0 0 0 1 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0
8 0 0 0 0 0 0 0 1 0 0
9 0 0 0 0 0 0 0 0 1 0
10 0 0 0 0 0 0 0 0 0 1
Balanced indexing
Now we’re on to the bit that is tripping up some of my code. I have a list, and want to index through its names and values. Though I think the same thing would apply to any indexing.
<- list(a = 1:10, b = seq(from = 0.1, to = 1, by = 0.1), d = 11:20) ballist
<- foreach(i = names(ballist), .combine = rbind) %:%
list_dep foreach(j = ballist[[i]], .combine = rbind) %dopar% {
<- tibble::tibble(outer_it = i, inner_it = j)
thisloop
}# simple_dep
table(list_dep)
inner_it
outer_it 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 10 11 12 13 14
a 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0
b 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
d 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
inner_it
outer_it 15 16 17 18 19 20
a 0 0 0 0 0 0
b 0 0 0 0 0 0
d 1 1 1 1 1 1
That looks right- there are only records for the inner values when they’re present in the outer. We can see that more clearly in the df itself
list_dep
Unbalanced indexing
The above should work the same if the list-items are different lengths, but let’s check
<- list(a = 1:5, b = seq(from = 0.1, to = 1, by = 0.1), d = 11:13) unballist
<- foreach(i = names(unballist), .combine = rbind) %:%
list_dep_unbal foreach(j = unballist[[i]], .combine = rbind) %dopar% {
<- tibble::tibble(outer_it = i, inner_it = j)
thisloop }
list_dep_unbal
That seems like it works how I expect. It’s not so clear then why I’m getting a shuffled issue in the code that prompted this, but it seems I need to look elsewhere.