ecmerkle/blavaan

Model fit and ppmc() with ordinal data

littlehifive opened this issue · 2 comments

Hi Ed,

@bgoodri and I are working on a project together where we wish to build a Bayesian CFA model for a few measures with items on an ordinal scale (e.g., 1-4 Likert scale). We have a few questions:

  1. I am a bit puzzled by the fact that the model fit of the Bayesian CFA (given by blavFitIndices()) is very different from that of the Frequentist CFA (given by lavaan::cfa()). I understand that the Frequentist model may be overfitting the data by giving CFI = 0.99 and RMSEA = 0.04. But the Bayesian model fit is very different. I checked the predicted ordinal values from the model and they seem to correspond well with the raw distribution of the items. Could it be because these Bayesian fit indices do not work too well with ordinal data?
Posterior mean (EAP) of devm-based fit indices:

      BRMSEA    BGammaHat adjBGammaHat          BMc 
       0.637        0.246       -0.497        0.000 
  1. When I tried to use ppmc() on my fitted Bayesian CFA model, I got this error. Is this because ppmc() does not work too well with ordinal data? Would adding mcmcextra = list(data = list(emiter = 50)) in bcfa() help? I am using blavaan_0.3-18.853.
Error in "mcmcdata" %in% names(lavobject@external) : 
  trying to get slot "external" from an object of a basic class ("NULL") with no slots
  1. The MargLogLik is still NA after adding mcmcextra = list(data = list(llnsamp = 200)). Is there another way to compute the likelihood?

Thanks!

Hi Zezhen, thanks for the questions. I think that these issues are mostly due to the fact that ordinal models are very new to blavaan. I will have to look at 1 some more... it might well be that these metrics do not work well for ordinal data (they were developed for continuous data), but I cannot rule out a bug right now. I will let you know if I find something.

For 2, this is a bug and should be fixed soon.

For 3, the "MargLogLik" is the likelihood used for Bayes factors (marginal over all parameters, as opposed to marginal over only the latent variables... the llnsamp setting is only used to compute a likelihood that is marginal over latent variables). In the continuous case, blavaan uses a Lagrange approximation here that does not immediately work for ordinal. The ordinal models return NA for now because I have not gotten around to implementing it.

And I would recommend upgrading blavaan to 0.4-1, or to the github version.

Just some follow-ups, involving commit b0c191e from earlier today:

  • I have turned off blavFitIndices() for ordinal models because I think some underlying code is assuming continuous data, and it will take some extra research to ensure this works on the ordinal side (the continuous metrics are based on recent publications, and I don't think the analogous publications exist for ordinal). This was an oversight on my part: in 0.4-1, I was focused on getting the Stan model working and did not test the fit indices as much as I should have.
  • ppmc() should now be working better with the ordinal models. But there is still some unresolved ambiguity. For example, the default argument to that is fit.measures = c("srmr", "chisq"). This creates a lavaan object using each posterior sample, then computes the usual frequentist metrics of srmr and chisq. But chisq is ambiguous for ordinal models. Under the default ppmc() arguments, it computes the DWLS chi square statistic using lavaan. If you change the argument to just fit.measures="chisq" (leaving out srmr and any others), then it approximates the multinomial likelihood underlying the Bayesian model and does the usual likelihood ratio statistic. This is clearly not the best situation, and I need to find an intuitive way to distinguish between them.