acguidoum/Sim.DiffProc

`expression()` doesn't seem to work with `snssde1d` in `foreach %dopar%`

VMOrca opened this issue · 4 comments

I'm trying to include an expression() in foreach() %dopar%. This expression() will be used to simulate an Ornstein-Uhlenbeck process via the Sim.DiffProc package within the foreach() %dopar% call.

However the parallel computing nodes doesn't seem to recognise the variables specified in expression() and I got the following error Error in { : task 1 failed - "object 'OUTheta' not found"

My code:

library(Sim.DiffProc)
library(doSNOW)
library(foreach)


cl <- makeCluster(2)
registerDoSNOW(cl)

a = foreach(i = 1:2, .packages = c('Sim.DiffProc')) %dopar% {
  OUMu = 1
  OUTheta = 1
  OUSigma = 1
  f = expression(OUTheta * (OUMu - x))
  g = expression(OUSigma)
  sim = Sim.DiffProc::snssde1d(drift = f, diffusion = g, x0 = 0, N = 10, T = 1, method = 'euler', M = 1)
  return(sim$X)
}

R version:

R.Version()
$platform
[1] "x86_64-w64-mingw32"

$arch
[1] "x86_64"

$os
[1] "mingw32"

$system
[1] "x86_64, mingw32"

$status
[1] ""

$major
[1] "4"

$minor
[1] "0.3"

$year
[1] "2020"

$month
[1] "10"

$day
[1] "10"

$`svn rev`
[1] "79318"

$language
[1] "R"

$version.string
[1] "R version 4.0.3 (2020-10-10)"

$nickname
[1] "Bunny-Wunnies Freak Out"

Good morning,

Indeed, in your code the parallel compute nodes do not recognize the specified variables in expression(), you must use the clusterExport() function available in parallel core package. To assigns the values on the master R process of the variables named in varlist to variables of the same names in the global environment of each node, i.e., in your case varlist=c("OUTheta","OUMu","OUSigma"). See the following R code:

R> library(doSNOW)
R> library(foreach)
R> OUMu = 1; OUTheta = 1; OUSigma = 1
R> f = expression(OUTheta * (OUMu - x))
R> g = expression(OUSigma)

R> cl <- makeCluster(2)
R> registerDoSNOW(cl)
R> parallel::clusterExport(cl,  varlist=c("OUTheta","OUMu","OUSigma"),envir = environment())

R> a <- foreach(i = 1:2, .packages = c('Sim.DiffProc'),.combine=list) %dopar% {
 + Sim.DiffProc::snssde1d(drift = f, diffusion = g, x0 = 0, N = 10, T = 1, method = 'euler', M = 1)$X
+}
R> a
[[1]]
Time Series:
Start = c(0, 1) 
End = c(1, 1) 
Frequency = 10 
           [,1]
 [1,] 0.0000000
 [2,] 0.5361421
 [3,] 0.6101941
 [4,] 0.5210987
 [5,] 0.1513371
 [6,] 0.6598853
 [7,] 1.1226598
 [8,] 1.4754373
 [9,] 1.5507929
[10,] 1.2577042
[11,] 1.3156085

[[2]]
Time Series:
Start = c(0, 1) 
End = c(1, 1) 
Frequency = 10 
             [,1]
 [1,]  0.00000000
 [2,] -0.04583063
 [3,]  0.64412154
 [4,] -0.11366549
 [5,]  0.09658761
 [6,]  0.02255550
 [7,] -0.17141103
 [8,]  0.45232315
 [9,]  0.63692530
[10,]  0.34672216
[11,]  0.25129187

R> parallel::stopCluster(cl)

Thanks very much for your prompt reply acguidoum!

Just wondering, if the parameters of the OU process, e.g. c("OUTheta","OUMu","OUSigma") are obtained within the foreach() %dopar% call, is there a workaround? So essentially the pseudocode looks like below:

foreach(i = 1:100) %dopar% {
  1. Estimate OUTheta, OUMu, OUSigma using existing data
  2. Simulate trajectories using the estimated parameters
  3. Calculate summary statistics from simulated trajectories 
  4. Return parameters and summary statistics as list
}

I guess I have to split step 1 and 2 on two foreach() %dopar% calls, i.e. after obtaining estimated parameters from step 1, then using parallel::clusterExport and start the second foreach() %dopar% call to simulate trajectories?

Many thanks in advance!

Estimate of the parameters of the SDEs by the maximum likelihood method or least squares estimator; you can use qmle() function available in yuima package, for more information see jss.v057.i04. For your algorithm, you can use a single loop (not necessarily with function foreach() %dopar%, there are several techniques of the parallel programming see the core package parallel jss.v031.i01).

R> library(parallel)
R> ?parLapply
R> ?mclapply

Thanks a lot acguidoum - much appreciated!