ecmerkle/blavaan

Add keyword arguments to blavFitIndices

lstmemery opened this issue ยท 9 comments

Hi there,

Thank you for this incredible software. When I run blavFitIndices on one of my models I get the warning: Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.

I was able to see that there is a single k value that is 0.7, which I believe is a good candidate for moment matching. Instead of solving this issue post hoc, I would like to add a ... argument to blavFitIndices. The ... argument would pass through to loo, waic, etc. If this works for you, I could start on a PR right away.

Thanks again,

Matt

Thanks, this sounds like a good addition. I believe that blavFitIndices() is calling fitMeasures() to obtain loo and waic, so it will probably take modification of at least two functions to do this. Any initial stab you take would be welcome, and I can clean up any parts of it that are confusing.

PS I had to resist saying "This is a premium addition that can be purchased for 500 robux" :)

Hey @ecmerkle,

It looks like the situation is a bit more difficult than I anticipated. Moment matching is only available when applying loo to a stanfit object. I can extract the stanfit from blavaan using fitstanfs_mcobj <- blavInspect(fitstanfs, "mcobj").

loo(fitstan_mcobj) works as expected. However, loo(fitstanfs_mcobj, moment_match = TRUE) gives the same error, regardless of the model used:

Error in object@.MISC$stan_fit_instance$unconstrain_pars(pars) : 
  Exception: Variable Lambda_y_free missing  (in 'model_stanmarg' at line 576)

For reference, this is what lines around 576 look like:

parameters {
  // free elements (possibly with inequality constraints) for coefficient matrices
  vector[len_free[1]] Lambda_y_free;
  //vector[len_free[2]] Lambda_x_free;
  vector[len_free[3]] Gamma_free;
  ...

I think loo is calling the loo_moment_match function, which in turn is calling the unconstrain_pars method attached to the stanfit. I don't know much about stan, but it looks like Lambda_y_free is already unconstrained, but I haven't come up with a way of fixing this. Do you have any ideas?

Thanks for looking at it, and here is what I think is happening.

In the big Stan file, I post-process a number of parameters in generated quantities, and then I only save the post-processed versions (using the pars argument of stan()). I do this to save memory (to avoid saving redundant parameters). But it is also possible to save extra parameters using blavaan's mcmcextra argument. So, to check that this is the problem, I think you could do this:

  1. Compare the parameters in the Stan file to the parameters that blavaan saves by default.
  2. For things that appear in the parameters block of Stan but not in the blavaan saved parameters, save them using the mcmcextra argument. It would look roughly like
fit <- bsem(model, data, other_arguments, mcmcextra = list(monitor = c("Lambda_y_free", "more_pars"))

Then, hopefully the moment matching would work on fit. If so, I think we could still add a ..., but maybe the user has to know to also use mcmcextra. (I am not sure I want to save redundant parameters by default, though maybe I could be convinced if there were multiple functions that need everything from the parameters block.)

That works! Unfortunately, moment matching is not going to save my model, but I can add changes.

It looks like loo_moment_match is only implemented for rstan models and will require all the free variables in the stan model. Because this is a pretty narrow use case, I would add an argument to the blavaan constructor called something like moment_match_k_threshold that defaults to NULL. If a value > 0.5 is input, the blavaan object will monitor all the free variables and blavFitIndices will default to loo with moment matching instead of loo. Otherwise, it will default to the typical loo.

Does that work for you?

I'd glad it worked, even if it didn't help you! About the extra argument: I am trying to keep extra settings like this in mcmcextra. For example, there is a llnsamp argument to control the number of samples to approximate the likelihood of ordinal models; see here.

Something could be done similarly here, where moment_match_k_threshold is supplied to mcmcextra$data, then blavaan knows what to do with it. Maybe this would be easiest for me to do, because I recently did that llnsamp thing.

On the other hand, maybe the current functionality is enough since we can supply extra parameters to mcmcextra. (particularly if I add an example of how to do it, say here). Let me know what you think.

I think it would make sense to add moment_match_k_threshold to mcmcdata. My reasoning is that some free variables will change depending on the modelling decisions, such as theta for ordinal models. You would also have to change bitFitIndices slightly to check for a flag. Currently, loo takes in llnsamp directly, but the model is needed when moment matching is used,

I won't stop you if you want to make the changes, but I should be able to get up PR this weekend. Up to you!

I won't get to it by the weekend, so will build off of anything you send!

Hey Ed,

I'm curious about this line. Why is the stan target not allowed here?

if(length(mcmcextra$monitor) > 0 & target != "stan"){

I think it is because of the way that add_monitors() works inside that if statement. I think add_monitors() was written to expect JAGS results, because blavaan was written for JAGS first. The extra monitors still exist for Stan, they just are not included in the parameter table.