yrosseel/lavaan

Extract df from under-identified model

isaactpetersen opened this issue · 2 comments

I'd like to extract the degrees of freedom from an under-identified model (that has more parameters than known values). That is, I'd like to extract the df even when the df is negative. As a teaching exercise, I'm fitting a formative model and would like to demonstrate that additional constraints are required to make the model identifiable. It used to be possible to extract the df from an under-identified model using the fitMeasures() function. However, that function now throws an error when the model is under-identified and doesn't provide the df:

fitMeasures(formativeModelFit, "df")
#> numeric(0)
fitMeasures(formativeModelFit)
#> Warning in lav_partable_check(lavpartable, categorical = lavoptions$.categorical, : lavaan WARNING: automatically added intercepts are set to zero:
#>     [formative]
#> Warning in lav_fit_cfi_lavobject(lavobject = object, fit.measures =
#> fit.measures, : lavaan WARNING: computation of robust CFI failed.
#> Error in if (!is.finite(X2) || !is.finite(df) || !is.finite(X2.null) || : missing value where TRUE/FALSE needed

Interestingly, the model output shows the df (see below), but I'd like to be able to extract it so I can assign it to an object:

#> lavaan 0.6.16 ended normally after 1 iteration
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                         5
#> 
#>                                                   Used       Total
#>   Number of observations                            50          75
#>   Number of missing patterns                         1            
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                    NA
#>   Degrees of freedom                                -5
#>   P-value (Unknown)                                 NA

I posted on the google group, but nobody responded, which leads me to think that this is not currently possible (though it used to be in prior versions of lavaan), though I could be wrong.

Here is a reproducible example:

library("lavaan")
#> This is lavaan 0.6-16
#> lavaan is FREE software! Please report any bugs.

complement <- function(y, rho, x) {
  x <- rnorm(length(y))
  y.perp <- residuals(lm(x ~ y))
  rho * sd(y.perp) * y + y.perp * sd(y) * sqrt(1 - rho^2)
}

set.seed(1)

v1 <- complement(PoliticalDemocracy$y1, .4)
v2 <- complement(PoliticalDemocracy$y1, .4)
v3 <- complement(PoliticalDemocracy$y1, .4)
v4 <- complement(PoliticalDemocracy$y1, .4)

PoliticalDemocracy$v1 <- v1
PoliticalDemocracy$v2 <- v2
PoliticalDemocracy$v3 <- v3
PoliticalDemocracy$v4 <- v4

PoliticalDemocracy <- 
  as.data.frame(lapply(
    PoliticalDemocracy,
    function(cc) cc[ sample(
      c(TRUE, NA),
      prob = c(0.9, 0.1),
      size = length(cc),
      replace = TRUE)]))

formativeModel_syntax <- '
 #Formative model factor loadings
 formative <~ v1 + v2 + v3 + v4
 
 formative ~~ formative
'

formativeModelFit <- sem(
  formativeModel_syntax,
  data = PoliticalDemocracy,
  missing = "ML",
  estimator = "MLR")
#> Warning in lav_data_full(data = data, group = group, cluster = cluster, :
#> lavaan WARNING: all observed variables are exogenous; model may not be
#> identified
#> Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: 25 cases were deleted due to missing values in 
#>        exogenous variable(s), while fixed.x = TRUE.
#> Warning in lav_partable_check(lavpartable, categorical = lavoptions$.categorical, : lavaan WARNING: automatically added intercepts are set to zero:
#>     [formative]
#> Warning in lav_model_vcov(lavmodel = lavmodel, lavsamplestats = lavsamplestats, : lavaan WARNING:
#>     Could not compute standard errors! The information matrix could
#>     not be inverted. This may be a symptom that the model is not
#>     identified.
#> Warning in max(abs(Hessian - t(Hessian))): no non-missing arguments to max;
#> returning -Inf
#> Warning in max(diag(Hessian)): no non-missing arguments to max; returning -Inf
#> Warning in lav_test_yuan_bentler(lavobject = NULL, lavsamplestats = lavsamplestats, : lavaan WARNING: could not invert information [matrix needed for robust test statistic

formativeModelFit
#> lavaan 0.6.16 ended normally after 1 iteration
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                         5
#> 
#>                                                   Used       Total
#>   Number of observations                            50          75
#>   Number of missing patterns                         1            
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                    NA
#>   Degrees of freedom                                -5
#>   P-value (Unknown)                                 NA

fitMeasures(formativeModelFit, "df")
#> numeric(0)
fitMeasures(formativeModelFit)
#> Warning in lav_partable_check(lavpartable, categorical = lavoptions$.categorical, : lavaan WARNING: automatically added intercepts are set to zero:
#>     [formative]
#> Warning in lav_fit_cfi_lavobject(lavobject = object, fit.measures =
#> fit.measures, : lavaan WARNING: computation of robust CFI failed.
#> Error in if (!is.finite(X2) || !is.finite(df) || !is.finite(X2.null) || : missing value where TRUE/FALSE needed

sessionInfo()
#> R version 4.3.1 (2023-06-16 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: America/Chicago
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] lavaan_0.6-16
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.4       cli_3.6.1         knitr_1.45        rlang_1.1.2      
#>  [5] xfun_0.41         purrr_1.0.2       styler_1.10.2     glue_1.6.2       
#>  [9] pbivnorm_0.6.0    htmltools_0.5.7   stats4_4.3.1      rmarkdown_2.25   
#> [13] quadprog_1.5-8    R.cache_0.16.0    evaluate_0.23     fastmap_1.1.1    
#> [17] yaml_2.3.7        lifecycle_1.0.4   compiler_4.3.1    fs_1.6.3         
#> [21] rstudioapi_0.15.0 R.oo_1.25.0       R.utils_2.12.3    digest_0.6.33    
#> [25] reprex_2.0.2      parallel_4.3.1    mnormt_2.1.1      magrittr_2.0.3   
#> [29] R.methodsS3_1.8.2 tools_4.3.1       withr_2.5.2

Created on 2023-11-18 with reprex v2.0.2

I think lav_partable_df() can give you what you're looking for. Bearing in mind that sem() sets fixed.x=TRUE by default, you need to pass that argument to get the matching parameter tables. Also, your example sets estimator = "MLR" and missing = "ML", both of which implicitly set meanstructure = TRUE.

parTable(formativeModelFit)
(PT <- lavaanify(formativeModel_syntax, fixed.x=TRUE, meanstructure=TRUE))

## wrong degrees of freedom using defaults
lav_partable_df(lavaanify(formativeModel_syntax)) # -1
## the right ones (matches summary() output)
lav_partable_df(PT)

That perfectly answered my question--thank you!