Issues with D1 function
lumiereljy opened this issue · 14 comments
I am performing stepwise model selection on multiply imputed datasets. Currently, I have an issue with backward elimination on the supermodel where I am using D1()
. I encountered the same problem as the following post where I always got the error when I tried to use with()
. #283
I was able to get pooled results with some workaround that evades with()
. However, D1()
requires an input of mira
, which is produced by with()
. The pooled result is an object of mipo
. Is there a workaround so that D1()
can work or is there another way that I can compare the nested models? I am attaching the code I have
## backward elimination on supermodel
ordinal.with <- list()
for (i in 1:5){
mod <- list(polr(as.factor(sbq_1) ~ age + gender + ethnicity, data=dff[[i]]))
ordinal.with <- c(ordinal.with, mod)
}
fit.with <- mice::pool(ordinal.with)
ordinal.without <- list()
for (i in 1:5){
mod <- list(polr(as.factor(sbq_1) ~ age + gender, data=dff[[i]]))
ordinal.without <- c(ordinal.without, mod)
}
fit.without <- mice::pool(ordinal.without)
I also tried to convert my data to mira
using the following code, but I got the error. Error in is.factor(x) : object 'sbq_1' not found
fit.with <- with(imp, polr(as.factor(sbq_1) ~ age + gender + ethnicity))
fit.without <- with(imp, polr(as.factor(sbq_1) ~ age + gender))
D1(fit.with, fit.without)
Ang suggestions on how to fix this?
If you store your m
analysis results in a list, then as.mira()
should be able to create a mira
object from that.
If you store your
m
analysis results in a list, thenas.mira()
should be able to create amira
object from that.
@stefvanbuuren thank you for the answer. Now I was able to create mira
object. However, I kept getting the following error Error in .extractParameters(model, diagonal = FALSE) : all(p == dim(Uhat[[1]])) is not TRUE
.
First, I stored my m
analysis results in ordinal.with
and ordinal.without
, then I created mira
objects fit.with
and fit.without
using ordinal.with
and ordinal.without
. Then, I applied D1
on fit.with
and fit.without
. pool()
still works on created mira
objects but not D1()
. Am I doing anything wrong here?
## backward elimination on supermodel
ordinal.with <- list()
for (i in 1:5){
mod <- list(polr(as.factor(sbq_1) ~ age + gender + ethnicity, data=dff[[i]]))
ordinal.with <- c(ordinal.with, mod)
}
ordinal.without <- list()
for (i in 1:5){
mod <- list(polr(as.factor(sbq_1) ~ age + gender, data=dff[[i]]))
ordinal.without <- c(ordinal.without, mod)
}
fit.with <- as.mira(ordinal.with)
fit.without <- as.mira(ordinal.without)
D1(fit.with, fit.without)
Is your code correct? I would expect something like ordinal.with[[i]] <- polr(..., data = complete(imp, i))
@stefvanbuuren dff
is a list of dataframe, and ordinal.with
is a list of m
ordinal regression models using m
imputed datasets. I believe it is correct, because I tested pool(fit.with)
and it did work. I am still struggling to understand the error Error in .extractParameters(model, diagonal = FALSE) : all(p == dim(Uhat[[1]])) is not TRUE.
and how I could fix this issue.
I don't know where .extractParameters()
comes from. What does traceback()
say?
@stefvanbuuren Here is what I have
Error in .extractParameters(model, diagonal = FALSE) :
all(p == dim(Uhat[[1]])) is not TRUE
> traceback()
5: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p)))
4: stopifnot(all(p == dim(Uhat[[1]])))
3: .extractParameters(model, diagonal = FALSE)
2: mitml::testModels(fit1, fit0, method = "D1", df.com = df.com)
1: D1(fit.with, fit.without)
Also I am attaching the output when I called pool()
, which did work.
summary(pool(fit.with))
term estimate std.error statistic df p.value
1 age -0.01519678 0.01880194 -0.8082557 79.87299 0.42134421
2 gender 1.12353238 0.44884432 2.5031672 79.81330 0.01434957
3 ethnicity -0.15929434 0.92224147 -0.1727252 30.23323 0.86401858
4 1|2 -0.21204825 2.11792558 -0.1001207 39.50767 0.92075483
5 2|3 0.80113803 2.12016941 0.3778651 40.11095 0.70752310
6 3|4 1.44495670 2.12634762 0.6795487 40.57657 0.50064877
So it looks like the error is indeed due to D1()
.
Perhaps @simongrund1 can shine a light?
Can you make a reprex
? It would be much easier to troubleshoot with a fully reproducible example.
@gerkovink Please see the following. I have provided a truncated dataset with m=2
and it gives the same error. Appreciate any suggestion.
##libraries
library(MASS)
library(mice)
#>
#> Attaching package: 'mice'
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
# import data
dff<-list()
dff[[1]] <- data.table::data.table(
sbq_1 = c(1L,2L,1L,2L,2L,
1L,2L,2L,4L,2L,1L,4L,2L,3L,1L,4L,2L,1L,
3L,1L,4L,1L,3L,4L,2L,1L,2L,2L,2L,2L,
1L,4L,3L,4L,1L,2L,3L,4L,1L,1L,3L,3L,1L,
1L,2L,1L,1L,1L,1L),
age = c(53L,32L,47L,58L,
34L,29L,57L,25L,38L,29L,51L,50L,37L,33L,
41L,36L,49L,38L,32L,51L,37L,36L,37L,39L,
55L,55L,45L,25L,24L,27L,26L,57L,47L,30L,
37L,39L,38L,34L,38L,56L,39L,43L,49L,39L,48L,
44L,63L,61L,46L),
gender = c(1L,1L,2L,1L,1L,
1L,1L,2L,2L,1L,1L,1L,1L,1L,1L,2L,1L,1L,
2L,1L,1L,1L,1L,1L,2L,1L,1L,2L,1L,1L,
1L,1L,2L,1L,2L,1L,1L,2L,1L,1L,1L,2L,1L,
1L,2L,1L,1L,2L,1L),
ethnicity = c(2L,2L,2L,2L,2L,
2L,2L,2L,2L,2L,2L,1L,1L,2L,2L,2L,2L,2L,
2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,
2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,
2L,2L,2L,2L,2L,2L)
)
dff[[2]] <- data.table::data.table(
sbq_1 = c(1L,2L,1L,2L,2L,
1L,2L,2L,4L,2L,1L,4L,2L,3L,1L,4L,2L,1L,
3L,1L,4L,1L,3L,4L,2L,1L,2L,2L,2L,2L,
1L,4L,3L,4L,1L,2L,3L,4L,1L,1L,3L,3L,1L,
1L,2L,1L,1L,1L,1L),
age = c(53L,32L,47L,58L,
34L,29L,57L,25L,38L,29L,51L,50L,37L,33L,
41L,36L,49L,38L,32L,51L,37L,36L,37L,39L,
55L,55L,45L,25L,24L,27L,26L,57L,47L,30L,
37L,39L,38L,34L,38L,56L,39L,43L,49L,39L,48L,
44L,63L,61L,46L),
gender = c(1L,1L,2L,1L,1L,
1L,1L,2L,2L,1L,1L,1L,1L,1L,1L,2L,1L,1L,
2L,1L,1L,1L,1L,1L,2L,1L,1L,2L,1L,1L,
1L,1L,2L,1L,2L,1L,1L,2L,1L,1L,1L,2L,1L,
1L,2L,1L,1L,2L,1L),
ethnicity = c(2L,2L,2L,2L,2L,
2L,2L,2L,2L,2L,2L,1L,2L,2L,2L,2L,2L,2L,
2L,2L,2L,2L,1L,2L,2L,2L,2L,2L,2L,2L,
2L,2L,2L,2L,2L,2L,2L,1L,2L,2L,2L,2L,2L,
2L,2L,2L,2L,2L,2L)
)
## backward elimination on supermodel
ordinal.with <- list()
for (i in 1:2){
mod <- list(polr(as.factor(sbq_1) ~ age + gender + ethnicity, data=dff[[i]]))
ordinal.with <- c(ordinal.with, mod)
}
ordinal.without <- list()
for (i in 1:2){
mod <- list(polr(as.factor(sbq_1) ~ age + gender, data=dff[[i]]))
ordinal.without <- c(ordinal.without, mod)
}
fit.with <- as.mira(ordinal.with)
fit.without <- as.mira(ordinal.without)
D1(fit.with, fit.without)
#>
#> Re-fitting to get Hessian
#>
#> Re-fitting to get Hessian
#>
#>
#> Re-fitting to get Hessian
#>
#>
#> Re-fitting to get Hessian
#> Error in .extractParameters(model, diagonal = FALSE): all(p == dim(Uhat[[1]])) is not TRUE
Created on 2021-10-04 by the reprex package (v2.0.1)
I sometimes run into a similar issue with MASS:polr()
. The issue stems from broom::tidy.polr
and other modeling/matching efforts that follow the same convention. Only the coefficients are put through as parameters in checks and gets and the intercepts are left alone. Your issue stems from exactly such an internal check in mitml::.extractParameters()
, where the obtained number of parameters p
(3) from the model is not identical to both dimensions of the varcov
matrix Uhat
that contains all (6) terms.
I believe it is the same behaviour that leads to the omission of the confidence intervals for the intercepts in broom:::tidy.polr()
.
> # more efficient mira
> fit.with <- dff %>%
+ map(~.x %>%
+ with(MASS::polr(as.factor(sbq_1) ~ age + gender + ethnicity, Hess = TRUE)))
>
> # same issue
> fit.with[[1]] %>% broom:::tidy.polr(conf.int = TRUE)
# A tibble: 6 × 7
term estimate std.error statistic conf.low conf.high coef.type
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 age -0.0431 0.0262 -1.65 -0.0964 0.00707 coefficient
2 gender 1.02 0.604 1.68 -0.162 2.22 coefficient
3 ethnicity -2.01 1.44 -1.39 -5.35 0.737 coefficient
4 1|2 -4.94 3.10 -1.59 NA NA scale
5 2|3 -3.50 3.07 -1.14 NA NA scale
6 3|4 -2.63 3.06 -0.860 NA NA scale
It's definitely not related to mice
.
@gerkovink Thanks for the explanation. Now it makes sense why this error occurs. I am wondering if there is any workaround so that I can fit an ordinal regression and then still be able to do a stepwise model selection on multiply imputed datasets?
Hi, and thanks for adding! I think @gerkovink is correct that this is due to the inconsistent return values of coef()
and vcov()
(from MASS::polr
). For mitml
, I just pushed a fix that extracts the coefficients from the summary()
instead, which seems to make this more consistent and should allow pooling parameters and model comparisons (with D1, D2, and D4). For mice
, this would probably need to be changed in broom
.
Thanks @simongrund1. mice
already pools the MASS::polr()
models correctly.
I believe this can be closed now.
Gerko, Simon, thanks for chiming in!