Interpretation of gradNegBin() in zeroinfl.R

Question

Interpretation of gradNegBin() in zeroinfl.R

gong-yuan opened this issue 6 years ago · 2 comments

Hi Alex, I've learned a lot by reading the source code of pscl library. I am trying to extend this library so that it works for right-censored data. Therefore it is very important for me to understand your code.
I have two questions:

When computing gradNegBin, you used working residual, e.g.

## working residuals
wres_count <- ifelse(Y1, Y - mu * (Y + theta)/(mu + theta), -exp(-log(dens0) +
      log(1 - muz) + clogdens0 + log(theta) - log(mu + theta) + log(mu)))

from my understanding, residual is the difference between actual and forecast. If working residual also follows this principle, why the estimate is of the form mu * (Y + theta)/(mu + theta) rather than just (1 - muz)*mu? Since Y is actual value, and I do not understand why we include it in the "forecast".
2. Why do you use working residual to instead of taking derivative when computing gradient of the log-likelihood function? Is there any justification?

Thank you very much for your consideration! @atahk
Best regards,
Kevin

Answer 1 · 2018-09-16T09:23:32.000Z

One more question:
In ziNegBin of zeroinfl.R

    ## negbin size
    theta <- exp(parms[(kx+kz)+1])

when theta is not an integer, e.g.

sum(dnbinom(x=seq(3.5), size = 3.5, mu = 1.2)*seq(3.5)) = 0.9412349

Then dnbinom is no longer a pdf as x only takes integers. Will this be a problem?

Answer 2 · 2018-09-16T19:56:40.000Z

The difference between the observation and the forecast is one type of residual (sometimes called the raw residual or response residual). There are other types, including the working residual, although these don't appear to quite be working residuals as the term is commonly defined. The ones in the functions you cite are the gradient of the log-likelihood contribution for each observation with respect to the linear predictor ηᵢ = xᵢ β.
These functions do, in fact, compute the gradient of the log-likelihood. The "working residuals" are just used in computing that gradient.
It's true that the response can only be a non-negative integer, but you get a valid pmf with positive values of theta even if they're not integers. I'm not sure why you think the equation you post suggests otherwise. If you mean to suggest that the pmf doesn't sum to one, you need to sum over all non-negative values for x (instead of just seq(3.5) = (1, 2, 3)). We can't quite do an infinite sum in R, but summing from 0 to 100 is close enough for the parameters you use. So the right calculation would be:
sum(dnbinom(x=seq(0, 100), size = 3.5, mu = 1.2)) = 1
which does work out as expected.