Yu-Group/simChef

Bug when set.seed() is called from within Experiment components

tiffanymtang opened this issue · 0 comments

If set.seed() is called from within DGPs/Methods/Evaluators/Visualizers and add_vary_across(.dgp = ...) is used, the seed affects all replicates after the first vary-across parameter value. Consequently, the results are the same across all replicates for a given vary-across parameter value (after the first one).

For now, it is highly recommended to not use set.seed() within DGPs/Methods/Evaluators/Visualizers.

Reproducible Example:

library(simChef)
library(magrittr)

rm(list = ls())
N_REPS <- 2
set.seed(1234)

#### DGPs ####

dgp_fun <- function(n, p) {
  X <- matrix(rnorm(n * p), nrow = n, ncol = p)
  y <- rnorm(n)
  return(list(X = X, y = y))
}
dgp <- create_dgp(dgp_fun, .name = "DGP", n = 300, p = 3)

noseed_fun <- function(X, y) {
  lm_df <- cbind(data.frame(X), .y = y)
  lm_fit <- lm(.y ~ ., data = lm_df)
  return(coef(lm_fit))
}
noseed_method <- create_method(noseed_fun, .name = "No Seed")

seed_fun <- function(X, y) {
  set.seed(1)
  lm_df <- cbind(data.frame(X), .y = y)
  lm_fit <- lm(.y ~ ., data = lm_df)
  return(coef(lm_fit))
}
seed_method <- create_method(seed_fun, .name = "Seed")

# this works
experiment <- create_experiment() %>%
  add_dgp(dgp) %>%
  add_method(noseed_method) %>%
  add_vary_across(
    .dgp = "DGP", n = c(100, 200, 300)
  )

out <- run_experiment(experiment, n_reps = N_REPS)
out$fit_results %>%
  dplyr::arrange(n)

# this gives the same results across all replicates
experiment <- create_experiment() %>%
  add_dgp(dgp) %>%
  add_method(noseed_method) %>%
  add_method(seed_method) %>%
  add_vary_across(
    .dgp = "DGP", n = c(100, 200, 300)
  )

out <- run_experiment(experiment, n_reps = N_REPS)
out$fit_results %>%
  dplyr::arrange(n)