hfgolino/EGAnet

Reproducibility issues

Closed this issue · 1 comments

Thank you for your package!

I faced some reproducibility issues in my script after calling some of your package functions. I investigated a little bit and here is what I found:

bootEGA() function

line 752: set.seed(NULL)
Seed reset will cause unforeseen reproducibility issues for the user who has set a seed at the beginning of their script, in case some code that rely on randomness is run after a call to the bootEGA() function.

line 807: result$plot.typical.ega <- plot(result, plot.args = plot.args)
For reasons I don't understand, this command breaks the chain of randomness defined by a seed.

Removing both lines will preserve the chain of randomness defined by a seed. However, bootEGA objects will still slightly differ across runs, even if a seed is defined. Increasing the number of iterations does not seem to solve the issue.

EGA() function

Call to the function breaks the chain of randomness defined by a seed. I did not inspect in details which part of the function is responsible.

@parisbastien,

I'm closing this issue because it will be resolved in the next major update 4a8dd10

This behavior was caused by (re-)setting seeds using R which would overrule any seed the user has set.

To maintain reproducibility, without disrupting user-defined seeds, the update uses {RcppZiggurat} to set seeds and generate random normal data in C++ (parametric bootstrap). I implemented a separate random sampling function in C++ to set seeds and perform bootstrap with resampling. By setting seeds in C++ and not R, all of R's random number generation is unaffected and seeds stay the same (I have tested and verified these cases)