mlr-org/parallelMap

proper seeding

jakob-r opened this issue · 1 comments

Bernd wanted to open an issue as mentioned here.
We might want to improve reproducibility in parallelMap by using seeding-mechanics for the different parallization techniques.

pat-s commented

Should we finally close this issue by incorporating the approach of https://stackoverflow.com/a/51347058/4185785?

I am afraid that many people messed up reproducibility in the past just because they were not aware of this. Also I think many people are not aware that set.seed() does not work for parallel calls.

Related issues

Suggested solution

  • Set set.seed(1, "L'Ecuyer-CMRG") before all calls of mode "multicore"

  • Set parallel::clusterSetRNGStream(iseed = 1) after all calls of mode "socket"

  • What about other modes such as "mpi" and "batchtools"?

parallelStart() would also gain an arg to turn this behavior off. By default it would ensure reproducibility in parallel scenarios without requiring knowledge by the user about different RNG kinds.

@berndbischl @jakob-r


On a side note: How does future deal with that issue when using plan(multicore) or plan(multisession)? @mllg
(better discuss this one in a separate issue in mlr3)