qdrg with MICE (as.mira) - is there any way to marry them?
Generalized opened this issue · 4 comments
I need to run ordinal logistic regression for repeated observations via GEE.
I can use either ordLORgee from the multgee package or repolr from the repolr package.
Unfortunately, neither supports emmeans. They can be integrated with emmeans via qdrg, though. This works very well.
Now, I have multiply imputed dataset. I know that emmeans works with via as.mira() - I use it with mmrm and gee - but this seems to work with models that emmeans can recognize.
But how to deal with qdrg this way? I'm afraid it's impossible, because how to pass all those coefficients and variances?
For example (from another question):
list_ords <- lapply(imp_data_long,
function(data)
ordLORgee(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,
data = data, id = PatientId, repeated = as.numeric(Visit_nOrd), LORstr = "RC"))
but what to do with qdrg?
qdrg(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,....?
I thought, that maybe I should create qdrg in each iterated analysis:
ord_gee_imp <- lapply(imp_data_long,
function(ddd) {
fitmod <- ordLORgee(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,
data = ddd, id = PatientId, repeated = as.numeric(Visit_nOrd), LORstr = "uniform")
lord_grid <- qdrg(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,
data=ddd,
coef = fitmod$coefficients,
vcov = fitmod$robust.variance,
df = Inf,
ordinal.dim =length(levels(data$ODIPain)),
link = fitmod$link)
})
and somehow pass it to emmeans, but it also doesn't work:
> emmeans(as.mira(ord_gee_imp), specs = ~Arm * Visit_nOrd)
Error in UseMethod("recover_data") :
no applicable method for 'recover_data' applied to an object of class "emmGrid"
Error in (function (object, at, cov.reduce = mean, cov.keep = get_emm_option("cov.keep"), :
Perhaps a 'data' or 'params' argument is needed
I desperately try to complete the analysis... I'm about to write the components necessary to implement the emmeans through the vignette, but I thought there is some workaround that can save me this time?
OK, I think I found a quick-and-(very)dirty workaround.
I looked at the source in this file:
Line 139 in 3957a59
namely to this function:
emm_basis.mira = function(object, trms, xlev, grid, ...) {
# In case our method did a "pass it on" with the data, we need to add that attribute
data = list(...)$misc$data
if(!is.null(data))
object$analyses = lapply(object$analyses, function(a) {attr(a, "data") = data; a})
bas = emm_basis(object$analyses[[1]], trms, xlev, grid, ...)
k = length(object$analyses)
# we just average the V and bhat elements...
V = 1/k * bas$V
allb = cbind(bas$bhat, matrix(0, nrow = length(bas$bhat), ncol = k - 1))
for (i in 1 + seq_len(k - 1)) {
basi = emm_basis(object$analyses[[i]], trms, xlev, grid, ...)
V = V + 1/k * basi$V
allb[, i] = basi$bhat
}
bas$bhat = apply(allb, 1, mean) # averaged coef
notna = which(!is.na(bas$bhat))
bas$V = V + (k + 1)/k * cov(t(allb[notna, , drop = FALSE])) # pooled via Rubin's rules
bas
}
This creates the emm.basis, which is kinda "input" for the emmeans. It prepares the necessary components, like pooled coefficients and (co)variances, passed later to emmeans.
And that's exactly what I need to pass to qdrg: coefficients, var_cov and df.
So I took your code to pool the estimates:
pool_estimates_for_qdrg <- function(k, coefficients, varcov) {
V <- 1/k * varcov[[1]]
allb <- cbind(coefficients[[1]], matrix(0, nrow = length(coefficients[[1]]), ncol = k - 1))
for (i in 1 + seq_len(k - 1)) {
V <- V + 1/k * varcov[[i]]
allb[, i] <- coefficients[[i]]
}
pooled_coef <- apply(allb, 1, mean)
notna <- which(!is.na(pooled_coef))
pooled_VC <- V + (k + 1)/k * cov(t(allb[notna, , drop = FALSE]))
return(list(pooled_coef = pooled_coef, pooled_VC = pooled_VC))
}
ord_gee_imp <- lapply(imp_data_long,
function(dat) {
ordLORgee(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,
data = dat,
id = PatientId,
repeated = as.numeric(Visit_nOrd), LORstr = "uniform")
})
ord_gee_imp_grid <- with(pool_estimates_for_qdrg(k=20,
coefficients = lapply(ord_gee_imp, function(analysis) analysis$coefficients),
varcov = lapply(ord_gee_imp, function(analysis) analysis$robust.variance)),
qdrg(formula = ODIPain ~ Visit_nOrd * Arm + Age_centered + Age_centered : Visit_nOrd,
data=imp_data_long[[1]], # fake data in any case
coef = pooled_coef,
vcov = pooled_VC,
df = Inf,
ordinal.dim = 6,
link = "Cumulative logit"))
(tidy(ord_gee_imp_em <- emmeans(ord_gee_imp_grid, specs = ~Arm * Visit_nOrd)))
# A tibble: 6 × 7
Arm Visit_nOrd estimate std.error df statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SoC Month 6 -1.05 0.262 Inf -4.01 0.0000618
2 HD Month 6 -1.11 0.545 Inf -2.04 0.0411
3 SoC Month 12 -1.40 0.462 Inf -3.03 0.00248
4 HD Month 12 -0.954 0.553 Inf -1.73 0.0843
5 SoC Month 20 -1.54 0.473 Inf -3.25 0.00114
6 HD Month 20 -0.707 0.550 Inf -1.29 0.198
update(contrast(ord_gee_imp_em ,
list("Month 6 : SoC vs. HD" = c(-1, 1, 0, 0, 0, 0),
"Month 12 : SoC vs. HD" = c( 0, 0,-1, 1, 0, 0),
"Month 20 : SoC vs. HD" = c( 0, 0, 0, 0,-1, 1))),
adjust="none", level = 0.95, infer = c(TRUE, TRUE))
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
Month 6 : SoC vs. HD -0.0642 0.332 Inf -0.715 0.587 -0.193 0.8468
Month 12 : SoC vs. HD 0.4436 0.376 Inf -0.293 1.180 1.181 0.2376
Month 20 : SoC vs. HD 0.8301 0.355 Inf 0.133 1.527 2.335 0.0195
Note: contrasts are still on the Cumulative logit scale
Confidence level used: 0.95
I plotted values of selected statistics for each imputed dataset and the pooled ones. The consistency is perfect, so it worked well.
So now I have a universal way of dealing with unknown models in the framework of multiple imputation. At least as long, as all necessary components will be provided...
Would you consider adding kind of a "helper function" that facilitates these steps in case someone is using the qdrg()? It could help a lot of people! (I guess...)
Thanks. I don't want to export that function as it has such a narrow purpose. However, I am providing a link to this issue in Section I of the models
vignette, so that people with this situation can find it.
qdrg()
is designed as something for pretty general use, I think an option like that is too specific to a very narrow class of models, so I'm not interested in adding that option at this time. And it would require essentially repeating the same code, because it can't just call emm_basis.mira
without some fancy workarounds.
By the way, link = "Cumulative logit"
is not supported. You should specify link = "logit"
.
Thank you. Yes, you are right. Anyway, it's great the qdrg() facilitates "bridging" emmeans to models in so many situations. It was a great idea to make it.