Parallelization of emmeans() for gls models is not supported
FehrAaron opened this issue · 3 comments
Describe the bug
Parallelization of the emmeans function is not possible, if the input model is a gls model made with the nmle package. The code below works when run not in parallel (%do% instead of %dopar%) and can be parallelized if a normal linear model (lm(value ~ timepoint * factor, data = lm_in)) is used instead.
To reproduce
set.seed(123)
#Generate mock data
df = rbind(data.frame(
factor = factor(rep(1:2,10), labels = c("F1","F2")),
time = factor(rep(1:5, each = 4)),
value = rnorm(20, 100, 10),
Sample = 1),
data.frame(
factor = factor(rep(1:2,10), labels = c("F1","F2")),
time = factor(rep(1:5, each = 4)),
value = rnorm(20, 150, 11),
Sample = 2))
#Setup cores
num_cores = 2
cl = makeCluster(num_cores)
registerDoParallel(cl)
#Calculate estimated marginal means
result = foreach(i = c(1,2), .combine = rbind, .packages = c("dplyr","emmeans","nlme")) %dopar% {
lm_in = df %>%
dplyr::filter(Sample == i)
model <- gls(value ~ time * factor, data = lm_in, weights = varIdent(form = ~ 1 | factor))
est_marginal_means = emmeans(model, specs = ~ time*factor)
as.data.frame(est_marginal_means)
}
stopCluster(cl)
Expected behavior
The result dataframe should be formed containting the estimated marginal means for all samples.
Instead, an error occurs:
Error in { :
task 1 failed - "Perhaps a 'data' or 'params' argument is needed"
3. stop(simpleError(msg, call = expr))
2. e$fun(obj, substitute(ex), parent.frame(), e$data)
- foreach(i = c(1, 2), .combine = rbind, .packages = c("dplyr",
"emmeans", "nlme")) %dopar% {
lm_in = df %>% dplyr::filter(Sample == i)
model <- gls(value ~ time * factor, data = lm_in, weights = varIdent(form = ~1 | ...
Additional context
Vectorization of the code using lapply also did not work and lead to the same error.
Ok, I guess I found the solution, by just adding the data to emmeans it works. But I have no idea why it is required only for the gls model and only when running in parallel. The change in the code above would be:
est_marginal_means = emmeans(model, specs = ~ time*factor)
Hmmm. I'm glad you found a workaround. The basic issue is that emmeans()
(actually ref_grid()
) requires two steps -- one to reconstruct the dataset, and the other to put together the reference grid. The data-recovery step is defined by a recover_data()
method which is different for each model class. That method for gls
is a bit dicey because of complications with weight functions and such (see the code for emmeans:::recover_data.gls
.) I think that is why it hits a snag.
Thanks, I will have a look!