drizopoulos/GLMMadaptive

Fitting Zero-inflated models for rare events in repeated measures data

Closed this issue · 1 comments

Hello Dr. Rizopoulos,

I have a question regarding mixed-effects logistic regression using your very impressive GLMMadpative package:

When creating a model for repeated measures data and the outcome variable has a rare occurrence rate (< 3%), meaning there are a lot of zeroes in the data, would it be appropriate to fit a hurdle or zero-inflated model (e.g., negative binomial)? I know the zi and hurdle models are meant for count data but I read that a binomial regression is first fit to assess the structural nature of the zeroes before fitting the NB thus yielding a better fitting model. I do have access to the counts of the outcome as well, but the true interest is whether we can accurately predict if an event occurs or not; the outcome of interest is not really the number of the particular event considering the data is at an "id and month" level.

I've reviewed King and Zeng (2001) paper about correcting standard errors for logistic regression when analyzing rare events (techniques are drawn from using logistic regression on small sample sizes) but I do not think the corrections are applicable to clustered data - I'm not statistically savvy enough (yet) to develop a proof to fully work through it but the logic seems sound.

In summation, I'm wondering what modeling efforts would be best suited for rare, recurrent, binary events using count covariates for repeated measures using your package?

This question is more about statistics rather than about the functionality of the package. In any case, you could fit both a zero-inflated and a hurdle model and see which one provides a better fit to your data.