Interrogating parallel functions

Author

Galen Holt

In testing parallel functions, especially on new machines, or if we’re trying to get a granular understanding of what they’re doing, we will want to do more than benchmark. We might, for example, want to know what cores and processes they’r using to make sure they’re taking full advantage of resources, or to better understand how resources get divided up, or to better understand the differences between plans.

I’ve done some basic speed testing of parallel functions as well as some testing of nested functions Here, I’ll build on that to better understand how the workers get divided up in those nested functions, and use that to jump off into testing different plans.

Packages and setup

I’ll use the {future} package, along with {dofuture} and {foreach}, because I tend to like writing for loops (there’s a reason I’ll try to write up sometime later). I test other packages in the {future} family (furrr, future_apply) where I try to better understand when they do and don’t give speed advantages.

library(microbenchmark)
library(doFuture)

Loading required package: foreach

Loading required package: future

library(foreach)
library(doRNG)

Loading required package: rngtools

library(ggplot2)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(tibble)
library(patchwork)

registerDoFuture()
plan(multisession)

Workers and cores

We get assigned workers and cores when we call plan . We can also get the outer process id

note on HPC, we need to use availableCores(methods = 'Slurm') .

availableWorkers()

 [1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
 [7] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
[13] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
[19] "localhost" "localhost"

availableCores()

system 
    20

Sys.getpid()

[1] 39732

The setup- nested functions

I’ll use the inner and outer parallel functions similar to my tests of nested functions, but instead of doing anything, they’ll just track the processes.

What do I want to interrogate? The process id, for one- I can use Sys.getpid(). I think I might just skip all the actual processing and just get the IDs

What do I want to check? How the processes get divvied up. Are the inner loops getting different processes? Are the outer, and then the inner all use that one? Does it change through the outer loop? Does it depend on the number of iterations?

inner_par <- function(outer_it, size) {
  inner_out <- foreach(j = 1:size,
                       .combine = bind_rows) %dorng% {
    
                         thisproc <- tibble(loop = "inner",
                                            outer_iteration = outer_it,
                                            inner_iteration = j, 
                                            pid = Sys.getpid())
    # d <- rnorm(size, mean = j)
    # 
    # f <- matrix(rnorm(size*size), nrow = size)
    # 
    # g <- d %*% f
    # 
    # mean(g)
    
  }
}

For the outer loop, let’s check the PID both before and after the inner loop runs.

outer_par <- function(outer_size, innerfun, inner_size) {
  outer_out <- foreach(i = 1:outer_size,
                       .combine = bind_rows) %dorng% {
                         
                        outerpre <- tibble(loop = 'outer_pre',
                                           outer_iteration = i,
                                           inner_iteration= NA,
                                           pid = Sys.getpid())
                         
                         # Now iterate over the values in c to do somethign else
                         inner_out <- innerfun(outer_it = i, size = inner_size)
                         
                         outerpost <- tibble(loop = 'outer_post',
                                           outer_iteration = i,
                                           inner_iteration= NA,
                                           pid = Sys.getpid())
                         
                         bind_rows(outerpre, inner_out, outerpost)
                         
                         
                       }
  
  return(outer_out)
}

PID division

test10 <- outer_par(outer_size = 10, innerfun = inner_par, inner_size = 10)
test10

# A tibble: 120 × 4
   loop      outer_iteration inner_iteration   pid
   <chr>               <int>           <int> <int>
 1 outer_pre               1              NA 22364
 2 inner                   1               1 22364
 3 inner                   1               2 22364
 4 inner                   1               3 22364
 5 inner                   1               4 22364
 6 inner                   1               5 22364
 7 inner                   1               6 22364
 8 inner                   1               7 22364
 9 inner                   1               8 22364
10 inner                   1               9 22364
# ℹ 110 more rows

Does the PID only change with the outer loop? Looks like yes

table(test10$outer_iteration, test10$pid)

    
     20144 22364 28944 32328 34072 36696 38248 41668 47340 52816
  1      0    12     0     0     0     0     0     0     0     0
  2      0     0     0     0     0    12     0     0     0     0
  3      0     0     0     0     0     0     0     0    12     0
  4      0     0     0    12     0     0     0     0     0     0
  5      0     0    12     0     0     0     0     0     0     0
  6      0     0     0     0     0     0     0     0     0    12
  7      0     0     0     0     0     0     0    12     0     0
  8     12     0     0     0     0     0     0     0     0     0
  9      0     0     0     0     0     0    12     0     0     0
  10     0     0     0     0    12     0     0     0     0     0

ggplot(test10, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0)) +
  scale_x_continuous(breaks = 0:10) +
  scale_color_binned(breaks = 0:10)

So yes, it assigns the PIDs to the outer loops and doesn’t add more for the inner. Does that change with different sizes?

Let’s check an outer loop with 1, and inner with 10, and vice versa

test1_10 <- outer_par(outer_size = 1, innerfun = inner_par, inner_size = 10)

test10_1 <- outer_par(outer_size = 10, innerfun = inner_par, inner_size = 1)

plot1_10 <- ggplot(test1_10, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plot10_1 <- ggplot(test10_1, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plot1_10 + plot10_1

So yeah, it always only assigns PIDs to the outer. That means this sort of nested loop doesn’t actually nest- only the outer gets parallelised, at least with plan(multisession).

I’m assuming more iterations doesn’t change that, but let’s push both above the number of workers (20)

test1_50 <- outer_par(outer_size = 1, innerfun = inner_par, inner_size = 50)
test50_1 <- outer_par(outer_size = 50, innerfun = inner_par, inner_size = 1)

plot1_50 <- ggplot(test1_50, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plot50_1 <- ggplot(test50_1, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plot1_50 + plot50_1

Yeah, so the same PID gets re-used across the outer loops but not the inner

table(test50_1$outer_iteration, test50_1$pid)

    
     18596 20144 21132 22364 28944 32328 33628 34072 36696 38248 39596 41668
  1      0     0     0     3     0     0     0     0     0     0     0     0
  2      0     0     0     3     0     0     0     0     0     0     0     0
  3      0     0     0     3     0     0     0     0     0     0     0     0
  4      0     0     0     0     0     0     0     0     3     0     0     0
  5      0     0     0     0     0     0     0     0     3     0     0     0
  6      0     0     0     0     0     0     0     0     0     0     0     0
  7      0     0     0     0     0     0     0     0     0     0     0     0
  8      0     0     0     0     0     0     0     0     0     0     0     0
  9      0     0     0     0     0     3     0     0     0     0     0     0
  10     0     0     0     0     0     3     0     0     0     0     0     0
  11     0     0     0     0     3     0     0     0     0     0     0     0
  12     0     0     0     0     3     0     0     0     0     0     0     0
  13     0     0     0     0     3     0     0     0     0     0     0     0
  14     0     0     0     0     0     0     0     0     0     0     0     0
  15     0     0     0     0     0     0     0     0     0     0     0     0
  16     0     0     0     0     0     0     0     0     0     0     0     3
  17     0     0     0     0     0     0     0     0     0     0     0     3
  18     0     0     0     0     0     0     0     0     0     0     0     3
  19     0     3     0     0     0     0     0     0     0     0     0     0
  20     0     3     0     0     0     0     0     0     0     0     0     0
  21     0     0     0     0     0     0     0     0     0     3     0     0
  22     0     0     0     0     0     0     0     0     0     3     0     0
  23     0     0     0     0     0     0     0     0     0     3     0     0
  24     0     0     0     0     0     0     0     3     0     0     0     0
  25     0     0     0     0     0     0     0     3     0     0     0     0
  26     0     0     0     0     0     0     0     0     0     0     0     0
  27     0     0     0     0     0     0     0     0     0     0     0     0
  28     0     0     0     0     0     0     0     0     0     0     0     0
  29     0     0     0     0     0     0     0     0     0     0     0     0
  30     0     0     0     0     0     0     0     0     0     0     0     0
  31     0     0     0     0     0     0     0     0     0     0     0     0
  32     0     0     0     0     0     0     0     0     0     0     0     0
  33     0     0     0     0     0     0     0     0     0     0     3     0
  34     0     0     0     0     0     0     0     0     0     0     3     0
  35     0     0     0     0     0     0     0     0     0     0     3     0
  36     0     0     0     0     0     0     0     0     0     0     0     0
  37     0     0     0     0     0     0     0     0     0     0     0     0
  38     0     0     0     0     0     0     0     0     0     0     0     0
  39     0     0     0     0     0     0     0     0     0     0     0     0
  40     0     0     0     0     0     0     0     0     0     0     0     0
  41     0     0     3     0     0     0     0     0     0     0     0     0
  42     0     0     3     0     0     0     0     0     0     0     0     0
  43     0     0     0     0     0     0     3     0     0     0     0     0
  44     0     0     0     0     0     0     3     0     0     0     0     0
  45     0     0     0     0     0     0     3     0     0     0     0     0
  46     3     0     0     0     0     0     0     0     0     0     0     0
  47     3     0     0     0     0     0     0     0     0     0     0     0
  48     0     0     0     0     0     0     0     0     0     0     0     0
  49     0     0     0     0     0     0     0     0     0     0     0     0
  50     0     0     0     0     0     0     0     0     0     0     0     0
    
     44228 44852 45460 46452 47340 47912 50512 52816
  1      0     0     0     0     0     0     0     0
  2      0     0     0     0     0     0     0     0
  3      0     0     0     0     0     0     0     0
  4      0     0     0     0     0     0     0     0
  5      0     0     0     0     0     0     0     0
  6      0     0     0     0     3     0     0     0
  7      0     0     0     0     3     0     0     0
  8      0     0     0     0     3     0     0     0
  9      0     0     0     0     0     0     0     0
  10     0     0     0     0     0     0     0     0
  11     0     0     0     0     0     0     0     0
  12     0     0     0     0     0     0     0     0
  13     0     0     0     0     0     0     0     0
  14     0     0     0     0     0     0     0     3
  15     0     0     0     0     0     0     0     3
  16     0     0     0     0     0     0     0     0
  17     0     0     0     0     0     0     0     0
  18     0     0     0     0     0     0     0     0
  19     0     0     0     0     0     0     0     0
  20     0     0     0     0     0     0     0     0
  21     0     0     0     0     0     0     0     0
  22     0     0     0     0     0     0     0     0
  23     0     0     0     0     0     0     0     0
  24     0     0     0     0     0     0     0     0
  25     0     0     0     0     0     0     0     0
  26     0     0     3     0     0     0     0     0
  27     0     0     3     0     0     0     0     0
  28     3     0     0     0     0     0     0     0
  29     3     0     0     0     0     0     0     0
  30     3     0     0     0     0     0     0     0
  31     0     3     0     0     0     0     0     0
  32     0     3     0     0     0     0     0     0
  33     0     0     0     0     0     0     0     0
  34     0     0     0     0     0     0     0     0
  35     0     0     0     0     0     0     0     0
  36     0     0     0     0     0     0     3     0
  37     0     0     0     0     0     0     3     0
  38     0     0     0     0     0     3     0     0
  39     0     0     0     0     0     3     0     0
  40     0     0     0     0     0     3     0     0
  41     0     0     0     0     0     0     0     0
  42     0     0     0     0     0     0     0     0
  43     0     0     0     0     0     0     0     0
  44     0     0     0     0     0     0     0     0
  45     0     0     0     0     0     0     0     0
  46     0     0     0     0     0     0     0     0
  47     0     0     0     0     0     0     0     0
  48     0     0     0     3     0     0     0     0
  49     0     0     0     3     0     0     0     0
  50     0     0     0     3     0     0     0     0

sequential outer loop

I assume if we make the outer loop sequential, the inner will then get PIDs

outer_seq <- function(outer_size, innerfun, inner_size) {
  outer_out <- foreach(i = 1:outer_size,
                       .combine = bind_rows) %do% {
                         
                        outerpre <- tibble(loop = 'outer_pre',
                                           outer_iteration = i,
                                           inner_iteration= NA,
                                           pid = Sys.getpid())
                         
                         # Now iterate over the values in c to do somethign else
                         inner_out <- innerfun(outer_it = i, size = inner_size)
                         
                         outerpost <- tibble(loop = 'outer_post',
                                           outer_iteration = i,
                                           inner_iteration= NA,
                                           pid = Sys.getpid())
                         
                         bind_rows(outerpre, inner_out, outerpost)
                         
                         
                       }
  
  return(outer_out)
}

testseq10 <- outer_seq(outer_size = 10, innerfun = inner_par, inner_size = 10)

ggplot(testseq10, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0)) +
  scale_x_continuous(breaks = 0:10) +
  scale_color_binned(breaks = 0:10)

At least that- if we have a sequential outer, it lets the inner parallelise.

Nested with %:%

The ‘proper’ way to nest foreach loops is with %:%. That’s not always possible, but I think we can here, to check what they’re getting.

outer_nest <- function(outer_size, innerfun, inner_size) {
  outer_out <- foreach(i = 1:outer_size,
                       .combine = bind_rows) %:% 
    foreach(j = 1:inner_size,
                       .combine = bind_rows) %dopar% {
    
                         thisproc <- tibble(loop = "",
                                            outer_iteration = i,
                                            inner_iteration = j, 
                                            pid = Sys.getpid())
                       }
  
  return(outer_out)
}

note- inner_par isn’t doing anything here, since it’s not a function anymore

testnest <- outer_nest(10, inner_par, 10)

ggplot(testnest, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0)) +
  scale_x_continuous(breaks = 0:10) +
  scale_color_binned(breaks = 0:10)

Now, each outer loop is getting two PIDs to split up with its inner loop.

So that makes sense- this way foreach knows what’s coming and can split up workers. It looks like the outer loop is favored, though that shouldn’t matter when they’re specified this way.

We can check though

testnest50_10 <- outer_nest(50, inner_par, 10)
testnest10_50 <- outer_nest(10, inner_par, 50)

plotnest50_10 <- ggplot(testnest50_10, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plotnest10_50 <- ggplot(testnest10_50, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plotnest50_10 + plotnest10_50

Those look more similar than they are because of different axes. I think that is giving more PIDs to the outer when it has more iterations.

List-plans

Naive- just defaults

There is information out there, largely related to using future.batchtools, that a list of plans lets us handle nested futures. Does that work with multisession?

plan("list") tells us what the plan is. This is super helpful for checking what’s going on.

plan(list(multisession, multisession))
plan("list")

List of future strategies:
1. multisession:
   - args: function (..., workers = availableCores(), lazy = FALSE, rscript_libs = .libPaths(), envir = parent.frame())
   - tweaked: FALSE
   - call: plan(list(multisession, multisession))
2. multisession:
   - args: function (..., workers = availableCores(), lazy = FALSE, rscript_libs = .libPaths(), envir = parent.frame())
   - tweaked: FALSE
   - call: plan(list(multisession, multisession))

Then let’s use the same outer_par we tried earlier

Let’s check an outer loop with 1, and inner with 10, and vice versa

test1_10_double <- outer_par(outer_size = 1, 
                             innerfun = inner_par, inner_size = 10)
test10_1_double <- outer_par(outer_size = 10, 
                             innerfun = inner_par, inner_size = 1)

plot1_10_double <- ggplot(test1_10_double, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plot10_1_double <- ggplot(test10_1_double, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plot1_10_double + plot10_1_double

That didn’t work. BUT, plan("list") shows that the first one uses workers = availableCores(). Does that eat all the cores?

Tweak outer plan

If we limit the outer plan, do the ‘leftover’ cores go to the inner?

plan(list(tweak(multisession, workers = 2), multisession))
plan("list")

List of future strategies:
1. multisession:
   - args: function (..., workers = 2, envir = parent.frame())
   - tweaked: TRUE
   - call: plan(list(tweak(multisession, workers = 2), multisession))
2. multisession:
   - args: function (..., workers = availableCores(), lazy = FALSE, rscript_libs = .libPaths(), envir = parent.frame())
   - tweaked: FALSE
   - call: plan(list(tweak(multisession, workers = 2), multisession))

Let’s check an outer loop with 1, and inner with 10, and vice versa

test1_10_double <- outer_par(outer_size = 1, 
                             innerfun = inner_par, inner_size = 10)
test10_1_double <- outer_par(outer_size = 10, 
                             innerfun = inner_par, inner_size = 1)

plot1_10_double <- ggplot(test1_10_double, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plot10_1_double <- ggplot(test10_1_double, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plot1_10_double + plot10_1_double

That restricted the outer, but didn’t give any to the inner. Can I get them there?

Tweak both plans

Now, we’re explicitly telling each plan how many workers it gets.

plan(list(tweak(multisession, workers = 2), tweak(multisession, workers = 5)))
plan("list")

List of future strategies:
1. multisession:
   - args: function (..., workers = 2, envir = parent.frame())
   - tweaked: TRUE
   - call: plan(list(tweak(multisession, workers = 2), tweak(multisession, workers = 5)))
2. multisession:
   - args: function (..., workers = 5, envir = parent.frame())
   - tweaked: TRUE
   - call: plan(list(tweak(multisession, workers = 2), tweak(multisession, workers = 5)))

Let’s check an outer loop with 1, and inner with 10, and vice versa

This worked previously but now is failing, parallelly won’t let me set up > 3 localhost ‘workers’ on only one core. I thought that since parallelly::availableCores() returned 20, it would give the outer to 1, and then the inners to a different set of 10. But it seems to be trying to give the inner to 10 on the outer (note that this works just fine on a cluster, so it’s sort of esoteric here anyway).

test1_10_double <- outer_par(outer_size = 1, 
                             innerfun = inner_par, inner_size = 10)
test10_1_double <- outer_par(outer_size = 10, 
                             innerfun = inner_par, inner_size = 1)

plot1_10_double <- ggplot(test1_10_double, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plot10_1_double <- ggplot(test10_1_double, aes(x = outer_iteration, 
                   y = as.factor(pid), 
                   color = inner_iteration)) + 
  geom_point(position = position_jitter(width = 0.2, height = 0))

plot1_10_double + plot10_1_double

That worked (But not anymore)! So, if we explicitly give each plan workers, we can manage nestedness. (If we can figure out how to do it with the new parallelly).

Turn the plan back to multisession

plan(multisession)
plan("list")

List of future strategies:
1. multisession:
   - args: function (..., workers = availableCores(), lazy = FALSE, rscript_libs = .libPaths(), envir = parent.frame())
   - tweaked: FALSE
   - call: plan(multisession)

Conclusions

Nesting parallel loops removes the parallelisation from the inner loop unless it’s done directly with %:% or with explicitly-declared list-plans.
- This means no intermediate processing without explicit worker-control
Maybe this is OK- the outer layer of parallelisation tends to work best since it reduces the overhead:processing ratio.
- And the situation where nesting makes the most sense (nodes with cores) lend themselves to list-plans with future.batchtools and worker control internally. I think.
BUT, need to be careful- we could be wasting workers if we parallelise over an outer layer with fewer jobs than workers- those ‘extra’ workers won’t be used by inner parallel loops
- above, where I had outer_size = 10, I got 10 PIDs, even though I have 20 cores available.
It’s unclear how this works with more complex situations (e.g. on an HPC with nodes and workers)-
- will the outer layer grab nodes and inner layer workers? Or do we need to manually manage that by using job arrays? I think the answer is list-plans and treating the loop over batchtools as essentially an array-job, just managed by R not slurm. But that needs to be tested.
I guess the good thing is the internal foreach isn’t hurting anything, except that a well-written for would be faster.
- So where we have control of things, need to test with and without foreaches in deep functions.
- It would sure be nice if deep parallel foreaches that are getting their parallelisation skipped worked as fast as for, or even as fast as a sequential foreach %do%

Interpreting the nested speed results

In my testing of speed of nested functions, parallelising the inner loop was always slow. That can’t always be because the parallelisation is getting skipped- it would happen when the outer loop was sequential- but it suggests that the parallelisation is actually making things worse, even when it gets skipped.

As a test, we can combine the return values I use here with those functions to do the same processing but return the pids instead of the values to confirm when those inner loops actually get run in parallel.

Functions

These do the same processing as in my tests of nested speed, but return a tibble of pids instead.

Inner loop

Parallel version

inner_par <- function(in_vec, size, outer_it) {
  inner_out <- foreach(j = in_vec,
                       .combine = bind_rows) %dorng% {
    d <- rnorm(size, mean = j)
    
    f <- matrix(rnorm(size*size), nrow = size)
    
    g <- d %*% f
    
    h <- mean(g)
    
    thisproc <- tibble(loop = "inner",
                       outer_iteration = outer_it,
                       inner_iteration = j, 
                       pid = Sys.getpid())
    
                       }
  return(inner_out)
}

Sequential version

inner_seq <- function(in_vec, size, outer_it) {
  inner_out <- foreach(j = in_vec,
                       .combine = bind_rows) %do% {
    d <- rnorm(size, mean = j)
    
    f <- matrix(rnorm(size*size), nrow = size)
    
    g <- d %*% f
    
    h <- mean(g)
    
    thisproc <- tibble(loop = "inner",
                       outer_iteration = outer_it,
                       inner_iteration = j, 
                       pid = Sys.getpid())
    
                       }
  
  return(inner_out)
}

Using preallocated for

This is likely to be faster than the sequential. Preallocate both the vector and the new tibble output.

inner_for <- function(in_vec, size, outer_it) {
  inner_out <- vector(mode = 'numeric', length = size)
  
  thisproc <- tibble(loop = "inner",
                       outer_iteration = outer_it,
                       inner_iteration = 1:length(in_vec), 
                       pid = Sys.getpid())
  
  for(j in 1:length(in_vec)) {
    d <- rnorm(size, mean = in_vec[j])
    
    f <- matrix(rnorm(size*size), nrow = size)
    
    g <- d %*% f
    
    inner_out[j] <- mean(g)
    
    thisproc$pid[j] <- Sys.getpid()
    thisproc$inner_iteration[j] <- j
    
  }
  
  return(thisproc)
}

Outer loop

I cant divide by inner_out now that it’s not a matrix, so just get a cv.

parallel

outer_par <- function(size, innerfun) {
  outer_out <- foreach(i = 1:size,
                       .combine = bind_rows) %dorng% {
                         
                         # Do a matrix mult on a vector specified with i
                         a <- rnorm(size, mean = i)
                         
                         b <- matrix(rnorm(size*size), nrow = size)
                         
                         cvec <- a %*% b
                         
                         # Now iterate over the values in c to do somethign else
                         inner_out <- innerfun(in_vec = cvec, 
                                               size = size, 
                                               outer_it = i)
                         
                         h <- sd(cvec)/mean(cvec)
                         
                         inner_out
                         
                       }
  
  return(outer_out)
}

sequential

outer_seq <- function(size, innerfun) {
  outer_out <- foreach(i = 1:size,
                       .combine = bind_rows) %do% {
                         
                         # Do a matrix mult on a vector specified with i
                         a <- rnorm(size, mean = i)
                         
                         b <- matrix(rnorm(size*size), nrow = size)
                         
                         cvec <- a %*% b
                         
                         # Now iterate over the values in c to do somethign else
                         inner_out <- innerfun(in_vec = cvec, 
                                               size = size, outer_it = i)
                         
                         h <- sd(cvec)/mean(cvec)
                         
                         inner_out
                       }
  
  return(outer_out)
}

Un-preallocated for

Because this would need to replace chnks in the tibble, it’s hard to preallocate. Just don’t bother- the point isn’t speed, it’s testing pids.

outer_for <- function(size, innerfun) {
  outer_out <- matrix(nrow = size, ncol = size)
  
  thisproc <- tibble(loop = "inner",
                       outer_iteration = 1,
                       inner_iteration = 1, 
                       pid = Sys.getpid(),
                     .rows = 0)
  
  for(i in 1:size) {
    
    # Do a matrix mult on a vector specified with i
    a <- rnorm(size, mean = i)
    
    b <- matrix(rnorm(size*size), nrow = size)
    
    cvec <- a %*% b
    
    # Now iterate over the values in c to do somethign else
    inner_out <- innerfun(in_vec = cvec, size = size, outer_it = i)
    
    outer_out[, i] <- sd(cvec)/mean(cvec)
    
    thisproc <- bind_rows(thisproc, inner_out)
    
    
  }
  outer_out <- c(outer_out) 
  
  
  return(thisproc)
}

Checking PIDs

I’m less interested here in benchmarks, and more in how the PIDs get used.

benchsize = 10
bench10 <- microbenchmark(
  out_par_in_par = outer_par(benchsize, inner_par),
  out_par_in_seq = outer_par(benchsize, inner_seq),
  out_par_in_for = outer_par(benchsize, inner_for),
  
  out_seq_in_par = outer_seq(benchsize, inner_par),
  out_seq_in_seq = outer_seq(benchsize, inner_seq),
  out_seq_in_for = outer_seq(benchsize, inner_for),
  
  out_for_in_par = outer_for(benchsize, inner_par),
  out_for_in_seq = outer_for(benchsize, inner_seq),
  out_for_in_for = outer_for(benchsize, inner_for),
  
  times = 10
)
bench10

Unit: milliseconds
           expr        min         lq        mean      median         uq
 out_par_in_par  1193.7239  1207.4758  1290.59221  1288.79385  1321.6980
 out_par_in_seq  1127.4661  1246.3317  1272.62601  1288.52345  1323.8286
 out_par_in_for  1173.9614  1186.9011  1250.45083  1256.53085  1297.1192
 out_seq_in_par 11670.2122 12574.2986 12682.89758 12726.14325 12996.4801
 out_seq_in_seq   156.7353   162.7612   181.79056   167.14630   176.2836
 out_seq_in_for    19.5915    20.9141    22.92349    21.38490    25.4058
 out_for_in_par 11662.1278 12575.8653 12836.07660 12861.56095 13334.0600
 out_for_in_seq   148.4654   154.1068   159.99215   160.31995   164.9881
 out_for_in_for    12.9523    15.1087    18.12103    15.46085    22.2107
        max neval
  1478.5961    10
  1330.2085    10
  1337.6089    10
 13759.5375    10
   253.3167    10
    30.5631    10
 13621.4263    10
   170.9712    10
    24.3082    10