Adding the following function for pooling ready emmeans objects to the emmeans package

Question

Adding the following function for pooling ready emmeans objects to the emmeans package

Generalized opened this issue 5 months ago · 4 comments

With regard to my other issue, where I described the case with multiply imputed data, I took the code from your package and changed it a little to allow for pooling emmeans rather than analysis objects.

This is especially helpful when we need to do a pooled analysis using some special features of the used modelling package, not handled out of the box by emmeans.

The function - adaptation of your emmeans:::emm_basis.mira code:

pool_emmeans <- function(emmeans_list) {
  bas = emmeans_list[[1]]
  k = length(emmeans_list)
  V = 1/k * bas@V
  allb = cbind(bas@bhat, matrix(0, nrow = length(bas@bhat), ncol = k - 1))
  
  for (i in 1 + seq_len(k - 1)) {
    basi = emmeans_list[[i]]
    V = V + 1/k * basi@V
    allb[, i] = basi@bhat
  }
  bas@bhat = apply(allb, 1, mean)
  notna = which(!is.na(bas@bhat))
  bas@dfargs$m = k
  bas@dfargs$df1 = bas@dffun
  bas@dfargs$B = cov(t(allb[notna, , drop = FALSE]))
  bas@V = bas@dfargs$T = V + (k + 1)/k * bas@dfargs$B
  bas@dffun = function(a, dfargs) {
    dfcom = dfargs$df1(a, dfargs)
    with(dfargs, {
      b = sum(a * (B %*% a))
      t = sum(a * (T %*% a))
      lambda = (1 + 1/m) * b/t
      dfold = (m - 1)/lambda^2
      dfobs = (dfcom + 1)/(dfcom + 3) * dfcom * (1 - lambda)
      ifelse(is.infinite(dfcom), dfold, dfold * dfobs/(dfold + dfobs))
    })
  }
   return(bas)
}

An exemplary use:

> d <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 7L, 7L, 7L, 8L, 8L, 8L, 
9L, 9L, 9L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 
13L, 13L, 14L, 14L, 14L), levels = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14"), class = "factor"), 
Arm = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L), levels = c("A", "B"), class = "factor"), Visit = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1", 
"V2", "V3"), class = c("ordered", "factor")), Visit_non_ord = structure(c(1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("V1", 
"V2", "V3"), class = "factor"), Painscore = c(1L, 2L, 4L, 
5L, 6L, NA, 4L, 2L, 3L, 4L, 3L, 4L, 5L, NA, 5L, 6L, 7L, 6L, 
4L, 3L, 5L, 1L, NA, 4L, 5L, 4L, 5L, 4L, 6L, NA, 0L, 4L, 3L, 
4L, NA, 4L, 3L, 4L, NA, 6L, 7L, 8L)), row.names = c(NA, -42L
), class = "data.frame")

> library(mice)
> library(glmtoolbox)

> imp <- mice(d, m=5)
> imp$predictorMatrix[1:4, 1:5] <- 0
> imp$predictorMatrix[5, 1] <- 0
> imp <- mice(d, m=5, predictorMatrix = imp$predictorMatrix)
> imp_list <- complete(imp, "all")

> trend_emms <- lapply(imp_list, 
                    function(dat) {
                        m <- glmgee(Painscore ~ Visit * Arm, 
                                    family = gaussian(link = "identity"), 
                                    id = ID,  
                                    data=dat, 
                                    corstr = "unstructured")
                        
                        emmeans(m, 
                                specs = ~ Visit | Arm,
                                adjust="none", 
                                vcov =  vcov(m, type = "bias-corrected"))   # this is not accessible from within emmeans
                      })

> pooled_ems <- pool_emmeans(trend_emms)
> contrast(pooled_ems, "poly", max.degree=2, by = "Arm")

Arm = A:
 contrast  estimate    SE    df t.ratio p.value
 linear        0.40 0.753 16.10   0.531  0.6026
 quadratic     1.03 1.168 17.48   0.881  0.3903

Arm = B:
 contrast  estimate    SE    df t.ratio p.value
 linear        1.40 0.752 13.01   1.861  0.0855
 quadratic    -1.74 1.864  7.54  -0.935  0.3787

Would you consider adding this little "helper" to your package?

Answer 1 · 2024-07-11T03:43:58.000Z

How about if you post this on Cross Validated. and I will consider including a link to that posting in the "models" vignette.

Answer 2 · 2024-10-01T13:45:48.000Z

@rvlenth Dear Professor,
Would this format be appropriate to be linked in the vignette?
It will be easier to update if something changes in future.

https://github.com/adrianolszewski/Useful-R-codes/blob/main/Pooling%20emmeans%20objects.md

Answer 3 · 2024-10-02T20:54:23.000Z

Thanks. I am adding this link to the models vignette. In doing so, I found I had done something like this before related to issue #446. I think these are not all that related, but I could be wrong, and in any case you may be interested.

Answer 4 · 2024-10-03T00:00:31.000Z

Thank you so much!

And also I updated the example:

simplified the dataset,
provided a more useful example with 2 approaches (the default one vs. lapply + pool_emmeans),
polished the descriptions and explained things step-by-step to show better when this approach may be useful.