Adding the following function for pooling ready emmeans objects to the emmeans package
Generalized opened this issue · 4 comments
With regard to my other issue, where I described the case with multiply imputed data, I took the code from your package and changed it a little to allow for pooling emmeans rather than analysis objects.
This is especially helpful when we need to do a pooled analysis using some special features of the used modelling package, not handled out of the box by emmeans.
The function - adaptation of your emmeans:::emm_basis.mira
code:
pool_emmeans <- function(emmeans_list) {
bas = emmeans_list[[1]]
k = length(emmeans_list)
V = 1/k * bas@V
allb = cbind(bas@bhat, matrix(0, nrow = length(bas@bhat), ncol = k - 1))
for (i in 1 + seq_len(k - 1)) {
basi = emmeans_list[[i]]
V = V + 1/k * basi@V
allb[, i] = basi@bhat
}
bas@bhat = apply(allb, 1, mean)
notna = which(!is.na(bas@bhat))
bas@dfargs$m = k
bas@dfargs$df1 = bas@dffun
bas@dfargs$B = cov(t(allb[notna, , drop = FALSE]))
bas@V = bas@dfargs$T = V + (k + 1)/k * bas@dfargs$B
bas@dffun = function(a, dfargs) {
dfcom = dfargs$df1(a, dfargs)
with(dfargs, {
b = sum(a * (B %*% a))
t = sum(a * (T %*% a))
lambda = (1 + 1/m) * b/t
dfold = (m - 1)/lambda^2
dfobs = (dfcom + 1)/(dfcom + 3) * dfcom * (1 - lambda)
ifelse(is.infinite(dfcom), dfold, dfold * dfobs/(dfold + dfobs))
})
}
return(bas)
}
An exemplary use:
> d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L,
9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L,
13L, 13L, 14L, 14L, 14L), levels = c("1", "2", "3", "4", "5",
"6", "7", "8", "9", "10", "11", "12", "13", "14"), class = "factor"),
Arm = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), levels = c("A", "B"), class = "factor"), Visit = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1",
"V2", "V3"), class = c("ordered", "factor")), Visit_non_ord = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1",
"V2", "V3"), class = "factor"), Painscore = c(1L, 2L, 4L,
5L, 6L, NA, 4L, 2L, 3L, 4L, 3L, 4L, 5L, NA, 5L, 6L, 7L, 6L,
4L, 3L, 5L, 1L, NA, 4L, 5L, 4L, 5L, 4L, 6L, NA, 0L, 4L, 3L,
4L, NA, 4L, 3L, 4L, NA, 6L, 7L, 8L)), row.names = c(NA, -42L
), class = "data.frame")
> library(mice)
> library(glmtoolbox)
> imp <- mice(d, m=5)
> imp$predictorMatrix[1:4, 1:5] <- 0
> imp$predictorMatrix[5, 1] <- 0
> imp <- mice(d, m=5, predictorMatrix = imp$predictorMatrix)
> imp_list <- complete(imp, "all")
> trend_emms <- lapply(imp_list,
function(dat) {
m <- glmgee(Painscore ~ Visit * Arm,
family = gaussian(link = "identity"),
id = ID,
data=dat,
corstr = "unstructured")
emmeans(m,
specs = ~ Visit | Arm,
adjust="none",
vcov = vcov(m, type = "bias-corrected")) # this is not accessible from within emmeans
})
> pooled_ems <- pool_emmeans(trend_emms)
> contrast(pooled_ems, "poly", max.degree=2, by = "Arm")
Arm = A:
contrast estimate SE df t.ratio p.value
linear 0.40 0.753 16.10 0.531 0.6026
quadratic 1.03 1.168 17.48 0.881 0.3903
Arm = B:
contrast estimate SE df t.ratio p.value
linear 1.40 0.752 13.01 1.861 0.0855
quadratic -1.74 1.864 7.54 -0.935 0.3787
Would you consider adding this little "helper" to your package?
How about if you post this on Cross Validated. and I will consider including a link to that posting in the "models" vignette.
@rvlenth Dear Professor,
Would this format be appropriate to be linked in the vignette?
It will be easier to update if something changes in future.
https://github.com/adrianolszewski/Useful-R-codes/blob/main/Pooling%20emmeans%20objects.md
Thanks. I am adding this link to the models
vignette. In doing so, I found I had done something like this before related to issue #446. I think these are not all that related, but I could be wrong, and in any case you may be interested.
Thank you so much!
And also I updated the example:
- simplified the dataset,
- provided a more useful example with 2 approaches (the default one vs. lapply + pool_emmeans),
- polished the descriptions and explained things step-by-step to show better when this approach may be useful.