
Reproducibility issues

Closed this issue · 1 comments

Thank you for your package!

I faced some reproducibility issues in my script after calling some of your package functions. I investigated a little bit and here is what I found:

bootEGA() function

line 752: set.seed(NULL)
Seed reset will cause unforeseen reproducibility issues for the user who has set a seed at the beginning of their script, in case some code that rely on randomness is run after a call to the bootEGA() function.

line 807: result$plot.typical.ega <- plot(result, plot.args = plot.args)
For reasons I don't understand, this command breaks the chain of randomness defined by a seed.

Removing both lines will preserve the chain of randomness defined by a seed. However, bootEGA objects will still slightly differ across runs, even if a seed is defined. Increasing the number of iterations does not seem to solve the issue.

EGA() function

Call to the function breaks the chain of randomness defined by a seed. I did not inspect in details which part of the function is responsible.


I'm closing this issue because it will be resolved in the next major update 4a8dd10

This behavior was caused by (re-)setting seeds using R which would overrule any seed the user has set.

To maintain reproducibility, without disrupting user-defined seeds, the update uses {RcppZiggurat} to set seeds and generate random normal data in C++ (parametric bootstrap). I implemented a separate random sampling function in C++ to set seeds and perform bootstrap with resampling. By setting seeds in C++ and not R, all of R's random number generation is unaffected and seeds stay the same (I have tested and verified these cases)