atahk/pscl

interpretation of hurdle() coefficients is not clear

nickreich opened this issue · 3 comments

Hi - I've appreciated using the pscl library and have found the hurdle and zero-inflated model fitting very useful for real dat analyses. however, i've run into an issue regarding interpretation that I have not found a clear answer for in any of the package documentation or associated references. Specifically, when fitting, say, a binomial logistic regression for the "zero" part of a hurdle model, is the outcome for this regression Pr(Y=0) or Pr(Y=1)? For example, if this is the output of my model:

Call:
hurdle(formula = count ~ 1, data = tmp)

Pearson residuals:
    Min      1Q  Median      3Q     Max 
-0.3587 -0.3587 -0.3587 -0.3587  9.4544 

Count model coefficients (truncated poisson with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.63430    0.06772   9.366   <2e-16 ***
Zero hurdle model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.75037    0.08905  -19.66   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 10 
Log-likelihood: -633.8 on 2 Df

Then I think the correct interpretation (based a small simulation study I ran to convince myself) is that the probability of getting a non-zero count in my dataset is exp(-1.75) ~= 0.17. The alternate interpretation would be that the probability of getting a zero count is 0.17. However, I was having trouble finding a clear statement that this is in fact the way the model is set up in any of the documentation. This is a simple question, of course, but obviously has drastic (i.e. reversing interpretation) implications for any interpretations of a real-data analysis. I'd suggest that something to clarify the above confusion be made explicit in the vignette or function documentation.

atahk commented

Hi Nick—

Thanks for this. You're right that the hurdle/zero stage is specified around the probability of a non-zero count. With the default binomial model for the hurdle stage, that means that link(Pr(Yᵢ>0))=Xᵢβ. Assuming Simon has no objections, I'll add some notes to the documentation for hurdle clarifying this, as obviously this is important for correctly interpreting the results.

But, note that the default link function and the one used in your example is logit. So, the probability of a non-zero count in your example is plogis(-1.75) ~= 0.15, not exp(-1.75). You can specify a log link for the hurdle stage by adding the option link="log", in which case exp(-1.75) would be correct, though I would think logit is usually preferable.

atahk commented

Commit 434d54b closes #1