proper seeding
jakob-r opened this issue · 1 comments
Bernd wanted to open an issue as mentioned here.
We might want to improve reproducibility in parallelMap by using seeding-mechanics for the different parallization techniques.
Should we finally close this issue by incorporating the approach of https://stackoverflow.com/a/51347058/4185785?
I am afraid that many people messed up reproducibility in the past just because they were not aware of this. Also I think many people are not aware that set.seed()
does not work for parallel calls.
Related issues
Suggested solution
-
Set
set.seed(1, "L'Ecuyer-CMRG")
before all calls of mode "multicore" -
Set
parallel::clusterSetRNGStream(iseed = 1)
after all calls of mode "socket" -
What about other modes such as "mpi" and "batchtools"?
parallelStart()
would also gain an arg to turn this behavior off. By default it would ensure reproducibility in parallel scenarios without requiring knowledge by the user about different RNG kinds.
On a side note: How does future deal with that issue when using plan(multicore)
or plan(multisession)
? @mllg
(better discuss this one in a separate issue in mlr3)