mrc-ide/mcstate

Problem with mcstate chain parallelisation

AVicco opened this issue · 7 comments

I have a problem with the chain parallelisation in mcstate and I was hoping you could kindly help me. I am currently following the vignette on mcstate parallelisation, but when I try it an error occurs. I have also included the relevant code to reproduce this error.

sir_generator <- odin.dust::odin_dust("IDX.R")
data <-read.csv(“data.csv”)
dt <- 0.25
combined_data <- mcstate::particle_filter_data(data = data,
time = "date",
rate = 1 / dt,
initial_time = 0)

n_particles <- 100
compare_function <- function(state, observed, pars = NULL) {
exp_noise <- 1e6
lambda <- state[3, , drop = TRUE] +
rexp(n = length(state), rate = exp_noise)
dpois(x = observed$cases, lambda = lambda, log = TRUE)
}

filter <- mcstate::particle_filter$new(data = combined_data,
model = sir_generator,
n_particles = n_particles,
compare = compare_function,
seed = 1L,
n_threads = 9)

beta <- mcstate::pmcmc_parameter("beta", 0.2, min = 0)
gamma <- mcstate::pmcmc_parameter("gamma", 0.1, min = 0, prior = function(p)
dgamma(p, shape = 1, scale = 0.2, log = TRUE))

proposal_matrix <- diag(0.1, 2)

mcmc_pars <- mcstate::pmcmc_parameters$new(list(beta = beta, gamma = gamma),
proposal_matrix)
n_steps <- 10000
n_burnin <- 200
n_chains <- 3
n_steps_retain <- 1000
n_workers <- 3
n_threads_total <-9

control <- mcstate::pmcmc_control(
n_steps,
save_state = TRUE,
save_trajectories = TRUE,
progress = TRUE,
n_chains = n_chains,
n_burnin = n_burnin,
n_steps_retain = n_steps_retain,
n_workers = n_workers,
n_threads_total = n_threads_total
)
pmcmc_run <- mcstate::pmcmc(mcmc_pars, filter, control = control)

When I run this, an error occurs:

“[] [+++] ETA ?s | 00:00:01 so far ( 0% 0% 0%)Error: +++] ETA ?s | 00:00:01 so far ( 0% 0% 0%)Error: callr subproccess failed> object "dust_cpu_IDX_alloc" not found. " where IDX is my odin model script.

This looks to be an (undocumented, but possibly unavoidable issue) with how dust creates models within a session and how mcstate does the parallelisation. Because we use callr to run things in another process we require that the model is available as an installable package - that's been the pattern that we have used, and is the pattern in the vignette. However, I can see that's not ideal - it might be possible to do better than this, but I will have to think about it for a bit. For now, I am afraid that workers will not work for you

Self-contained example

  path <- system.file("examples/sir.cpp", package = "dust", mustWork = TRUE)
  model <- dust::dust(path, quiet = TRUE)

  dat <- example_sir()
  n_particles <- 10
  n_steps <- 15
  n_chains <- 4

  filter <- particle_filter$new(dat$data, model, n_particles, dat$compare,
                                n_threads = 1, index = dat$index, seed = 1L)
  control <- pmcmc_control(n_steps, n_chains = n_chains,
                           n_workers = 2L, n_threads_total = 2L,
                           progress = FALSE, use_parallel_seed = TRUE)
  ans <- pmcmc(dat$pars, filter, control = control)

I think the best way to cope with this is to register some hooks in dust itself that will recognise when the model it has is a transient package and arrange loading that package at the point of initialisation if it's not been done yet

This turns out to be hard to via a dust hook through how R6 objects are serialised - the parent_env field ends up set as .GlobalEnv on load if the "package" is not already loaded. We can't force it to be loaded by name alone (even if we move it to the right place) without actually installing it (because .packages() and loadNamespace require that Meta/package.rds exists - quite possibly other things too.

I think that we might be able to do this by having dust models advertise that they need some help and provide information about paths, then arrange to provide hooks that can load the model as pieces of standalone code that we can then call from mcstate::pmcmc_chains_run on each of the sessions. Probably just passing an environment variable MCSTATE_PACKAGE_PATH might be enough?

Sorry, these were notes to myself about how a fix might be implemented, not instructions for you! I'll look at this at some point over the next couple of weeks and will post with proper instructions once it works

Hi Anna - please don't add unrelated requests to an issue thread as it makes it hard to keep track of things.

It looks like you are at Imperial/DIDE - can you please send an email or Teams message to me. You will need to expand your question so that I can see more clearly what you are doing and trying to do without having to ask more questions. See the resources listed here for advice on how to do this.