Issues with D1 function

Question

Issues with D1 function

lumiereljy opened this issue 3 years ago · 14 comments

I am performing stepwise model selection on multiply imputed datasets. Currently, I have an issue with backward elimination on the supermodel where I am using D1(). I encountered the same problem as the following post where I always got the error when I tried to use with(). #283

I was able to get pooled results with some workaround that evades with(). However, D1() requires an input of mira, which is produced by with(). The pooled result is an object of mipo. Is there a workaround so that D1() can work or is there another way that I can compare the nested models? I am attaching the code I have

## backward elimination on supermodel

ordinal.with <- list()
for (i in 1:5){
  mod <- list(polr(as.factor(sbq_1) ~ age + gender + ethnicity, data=dff[[i]]))
  ordinal.with <- c(ordinal.with, mod)
}

fit.with <- mice::pool(ordinal.with)

ordinal.without <- list()
for (i in 1:5){
  mod <- list(polr(as.factor(sbq_1) ~ age + gender, data=dff[[i]]))
  ordinal.without <- c(ordinal.without, mod)
}

fit.without <- mice::pool(ordinal.without)

I also tried to convert my data to mira using the following code, but I got the error. Error in is.factor(x) : object 'sbq_1' not found


fit.with <- with(imp,  polr(as.factor(sbq_1) ~  age + gender + ethnicity))
fit.without <- with(imp,  polr(as.factor(sbq_1) ~  age + gender))

D1(fit.with, fit.without)

Ang suggestions on how to fix this?

Answer 1 · 2021-09-30T09:06:00.000Z

If you store your m analysis results in a list, then as.mira() should be able to create a mira object from that.

Answer 2 · 2021-10-01T18:30:21.000Z

If you store your m analysis results in a list, then as.mira() should be able to create a mira object from that.

@stefvanbuuren thank you for the answer. Now I was able to create mira object. However, I kept getting the following error Error in .extractParameters(model, diagonal = FALSE) : all(p == dim(Uhat[[1]])) is not TRUE.

First, I stored my m analysis results in ordinal.with and ordinal.without, then I created mira objects fit.with and fit.without using ordinal.with and ordinal.without. Then, I applied D1 on fit.with and fit.without. pool() still works on created mira objects but not D1(). Am I doing anything wrong here?

## backward elimination on supermodel

ordinal.with <- list()
for (i in 1:5){
  mod <- list(polr(as.factor(sbq_1) ~ age + gender + ethnicity, data=dff[[i]]))
  ordinal.with <- c(ordinal.with, mod)
}

ordinal.without <- list()
for (i in 1:5){
  mod <- list(polr(as.factor(sbq_1) ~ age + gender, data=dff[[i]]))
  ordinal.without <- c(ordinal.without, mod)
}

fit.with <- as.mira(ordinal.with)
fit.without <- as.mira(ordinal.without)

D1(fit.with, fit.without)

Answer 3 · 2021-10-04T11:59:55.000Z

Is your code correct? I would expect something like ordinal.with[[i]] <- polr(..., data = complete(imp, i))

Answer 4 · 2021-10-04T18:55:34.000Z

@stefvanbuuren dff is a list of dataframe, and ordinal.with is a list of m ordinal regression models using m imputed datasets. I believe it is correct, because I tested pool(fit.with) and it did work. I am still struggling to understand the error Error in .extractParameters(model, diagonal = FALSE) : all(p == dim(Uhat[[1]])) is not TRUE. and how I could fix this issue.

Answer 5 · 2021-10-04T20:05:44.000Z

I don't know where .extractParameters() comes from. What does traceback() say?

Answer 6 · 2021-10-04T21:20:37.000Z

@stefvanbuuren Here is what I have

Error in .extractParameters(model, diagonal = FALSE) : 
  all(p == dim(Uhat[[1]])) is not TRUE
> traceback()
5: stop(simpleError(msg, call = if (p <- sys.parent(1L)) sys.call(p)))
4: stopifnot(all(p == dim(Uhat[[1]])))
3: .extractParameters(model, diagonal = FALSE)
2: mitml::testModels(fit1, fit0, method = "D1", df.com = df.com)
1: D1(fit.with, fit.without)

Also I am attaching the output when I called pool(), which did work.

summary(pool(fit.with))

       term    estimate  std.error  statistic       df    p.value
1       age -0.01519678 0.01880194 -0.8082557 79.87299 0.42134421
2    gender  1.12353238 0.44884432  2.5031672 79.81330 0.01434957
3 ethnicity -0.15929434 0.92224147 -0.1727252 30.23323 0.86401858
4       1|2 -0.21204825 2.11792558 -0.1001207 39.50767 0.92075483
5       2|3  0.80113803 2.12016941  0.3778651 40.11095 0.70752310
6       3|4  1.44495670 2.12634762  0.6795487 40.57657 0.50064877

So it looks like the error is indeed due to D1().

Answer 7 · 2021-10-04T21:27:47.000Z

Perhaps @simongrund1 can shine a light?

Answer 8 · 2021-10-05T00:08:12.000Z

Can you make a reprex? It would be much easier to troubleshoot with a fully reproducible example.

Answer 9 · 2021-10-05T02:26:03.000Z

@gerkovink Please see the following. I have provided a truncated dataset with m=2 and it gives the same error. Appreciate any suggestion.

##libraries
library(MASS)
library(mice)
#> 
#> Attaching package: 'mice'
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind

# import data
dff<-list()
dff[[1]] <- data.table::data.table(
                   sbq_1 = c(1L,2L,1L,2L,2L,
                             1L,2L,2L,4L,2L,1L,4L,2L,3L,1L,4L,2L,1L,
                             3L,1L,4L,1L,3L,4L,2L,1L,2L,2L,2L,2L,
                             1L,4L,3L,4L,1L,2L,3L,4L,1L,1L,3L,3L,1L,
                             1L,2L,1L,1L,1L,1L),
                     age = c(53L,32L,47L,58L,
                             34L,29L,57L,25L,38L,29L,51L,50L,37L,33L,
                             41L,36L,49L,38L,32L,51L,37L,36L,37L,39L,
                             55L,55L,45L,25L,24L,27L,26L,57L,47L,30L,
                             37L,39L,38L,34L,38L,56L,39L,43L,49L,39L,48L,
                             44L,63L,61L,46L),
                  gender = c(1L,1L,2L,1L,1L,
                             1L,1L,2L,2L,1L,1L,1L,1L,1L,1L,2L,1L,1L,
                             2L,1L,1L,1L,1L,1L,2L,1L,1L,2L,1L,1L,
                             1L,1L,2L,1L,2L,1L,1L,2L,1L,1L,1L,2L,1L,
                             1L,2L,1L,1L,2L,1L),
               ethnicity = c(2L,2L,2L,2L,2L,
                             2L,2L,2L,2L,2L,2L,1L,1L,2L,2L,2L,2L,2L,
                             2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,
                             2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,
                             2L,2L,2L,2L,2L,2L)
            )

dff[[2]] <- data.table::data.table(
                   sbq_1 = c(1L,2L,1L,2L,2L,
                             1L,2L,2L,4L,2L,1L,4L,2L,3L,1L,4L,2L,1L,
                             3L,1L,4L,1L,3L,4L,2L,1L,2L,2L,2L,2L,
                             1L,4L,3L,4L,1L,2L,3L,4L,1L,1L,3L,3L,1L,
                             1L,2L,1L,1L,1L,1L),
                     age = c(53L,32L,47L,58L,
                             34L,29L,57L,25L,38L,29L,51L,50L,37L,33L,
                             41L,36L,49L,38L,32L,51L,37L,36L,37L,39L,
                             55L,55L,45L,25L,24L,27L,26L,57L,47L,30L,
                             37L,39L,38L,34L,38L,56L,39L,43L,49L,39L,48L,
                             44L,63L,61L,46L),
                  gender = c(1L,1L,2L,1L,1L,
                             1L,1L,2L,2L,1L,1L,1L,1L,1L,1L,2L,1L,1L,
                             2L,1L,1L,1L,1L,1L,2L,1L,1L,2L,1L,1L,
                             1L,1L,2L,1L,2L,1L,1L,2L,1L,1L,1L,2L,1L,
                             1L,2L,1L,1L,2L,1L),
               ethnicity = c(2L,2L,2L,2L,2L,
                             2L,2L,2L,2L,2L,2L,1L,2L,2L,2L,2L,2L,2L,
                             2L,2L,2L,2L,1L,2L,2L,2L,2L,2L,2L,2L,
                             2L,2L,2L,2L,2L,2L,2L,1L,2L,2L,2L,2L,2L,
                             2L,2L,2L,2L,2L,2L)
            )

## backward elimination on supermodel

ordinal.with <- list()
for (i in 1:2){
  mod <- list(polr(as.factor(sbq_1) ~ age + gender + ethnicity, data=dff[[i]]))
  ordinal.with <- c(ordinal.with, mod)
}

ordinal.without <- list()
for (i in 1:2){
  mod <- list(polr(as.factor(sbq_1) ~ age + gender, data=dff[[i]]))
  ordinal.without <- c(ordinal.without, mod)
}

fit.with <- as.mira(ordinal.with)
fit.without <- as.mira(ordinal.without)

D1(fit.with, fit.without)
#> 
#> Re-fitting to get Hessian
#> 
#> Re-fitting to get Hessian
#> 
#> 
#> Re-fitting to get Hessian
#> 
#> 
#> Re-fitting to get Hessian
#> Error in .extractParameters(model, diagonal = FALSE): all(p == dim(Uhat[[1]])) is not TRUE

^{Created on 2021-10-04 by the reprex package (v2.0.1)}

Answer 10 · 2021-10-05T03:08:05.000Z

I sometimes run into a similar issue with MASS:polr(). The issue stems from broom::tidy.polr and other modeling/matching efforts that follow the same convention. Only the coefficients are put through as parameters in checks and gets and the intercepts are left alone. Your issue stems from exactly such an internal check in mitml::.extractParameters(), where the obtained number of parameters p (3) from the model is not identical to both dimensions of the varcov matrix Uhat that contains all (6) terms.

I believe it is the same behaviour that leads to the omission of the confidence intervals for the intercepts in broom:::tidy.polr().

> # more efficient mira
> fit.with <- dff %>% 
+   map(~.x %>% 
+         with(MASS::polr(as.factor(sbq_1) ~ age + gender + ethnicity, Hess = TRUE)))
> 
> # same issue
> fit.with[[1]] %>% broom:::tidy.polr(conf.int = TRUE)
# A tibble: 6 × 7
  term      estimate std.error statistic conf.low conf.high coef.type  
  <chr>        <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>      
1 age        -0.0431    0.0262    -1.65   -0.0964   0.00707 coefficient
2 gender      1.02      0.604      1.68   -0.162    2.22    coefficient
3 ethnicity  -2.01      1.44      -1.39   -5.35     0.737   coefficient
4 1|2        -4.94      3.10      -1.59   NA       NA       scale      
5 2|3        -3.50      3.07      -1.14   NA       NA       scale      
6 3|4        -2.63      3.06      -0.860  NA       NA       scale

It's definitely not related to mice.

Answer 11 · 2021-10-05T03:52:11.000Z

@gerkovink Thanks for the explanation. Now it makes sense why this error occurs. I am wondering if there is any workaround so that I can fit an ordinal regression and then still be able to do a stepwise model selection on multiply imputed datasets?

Answer 12 · 2021-10-05T11:37:03.000Z

Hi, and thanks for adding! I think @gerkovink is correct that this is due to the inconsistent return values of coef() and vcov() (from MASS::polr). For mitml, I just pushed a fix that extracts the coefficients from the summary() instead, which seems to make this more consistent and should allow pooling parameters and model comparisons (with D1, D2, and D4). For mice, this would probably need to be changed in broom.

Answer 13 · 2021-10-05T11:57:41.000Z

Thanks @simongrund1. mice already pools the MASS::polr() models correctly.

I believe this can be closed now.

Answer 14 · 2021-10-05T12:00:07.000Z

Gerko, Simon, thanks for chiming in!