Negative Binomial random number generator

Question

Negative Binomial random number generator

Closed this issue 9 years ago · 7 comments

In

`NegBin` <- function(n, mu, alpha) {
    mu <- mu * rgamma(n, shape = alpha, rate = 1/alpha)
    rpois(n, lambda = mu)
}

shouldn't the parameter scale = 1/alpha rather than rate = 1/alpha ?

e.g. when calling:

mean(coenocliner:::NegBin(n = 1000, mu = 10, alpha = 0.5))
the result is e.g. 2.49048 instead of 10 (the mean parameter)

Also the naming is confusing: alpha usually refers to dispersion parameter,
that is Var[X] = mu + alpha mu^2

but as given in the function above Var[X] = mu + mu^2/alpha
(if alpha is in the denominator usually it is named theta, the size factor).

Answer 1 · 2016-05-04T23:29:35.000Z

I've confused myself here, @nlhuong. The intention was

`NegBin` <- function(n, mu, alpha) {
    mu <- mu * rgamma(n, shape = 1/alpha, rate = 1/alpha)
    rpois(n, lambda = mu)
}

which by implication is scale = alpha (from the definition of rgamma).

I've used the NB2 parameterization of the negative binomial, in which the scale parameter is known as alpha. In the glm.nb way of thinking, theta = 1 / alpha. The idea here is that this "dispersion" parameter takes values that are positively related to the amount of overdispersion in the data. If alpha = 0 then we have a Poisson (although that is not allowed for here as we'd end up with mu * 0...).

I think I originally implemented this simulate data to fit a glm.nb() model and messed up undoing the conversions from alpha to theta which glm.nb() used.

With the redefined NegBin() I now get:

> set.seed(1)
> rnb <- NegBin(n = 100000, mu = 10, alpha = 0.0000000001)
> set.seed(1)
> rpo <- rpois(100000, lambda = 10)
> mean(rnb)
[1] 10.00445
> mean(rpo)
[1] 9.99331
> var(rnb)
[1] 10.00239
> var(rpo)
[1] 10.01231

which indicates alpha is working in the right direction, and further

> NegBin(10, mu = 10, alpha = 10)
 [1]  0  0  6 96  9  0  3  0  3  0
> NegBin(10, mu = 10, alpha = 1)
 [1]  3  8 11  1  4 34  8  5  2 10
> NegBin(10, mu = 10, alpha = 0.1)
 [1]  9  9  6  9 12  4 10  6  9 10

shows it's doing the right thing.

And the variance is Var[x] = mu + alpha * mu^2.

I find this parameterisation more natural, but I should be more explicit about what is intended.

I'll correct this in the package and update the documentation to emphasise the parametrization used.

Answer 2 · 2016-05-05T01:24:21.000Z

You might want to put a line similar to e.g.
if (alpha == 0) alpha <- .Machine$double.eps

to guard against the code breaking in case people want to input alpha = 0 to get a Poisson variable.

Otherwise, the rewritten code looks good. Make sure you change the ZINB as well.

Answer 3 · 2016-05-05T05:57:24.000Z

The ZINB definition also needs updating. Reopening until I check in the relevant updates; testing now.

Answer 4 · 2016-05-05T16:52:31.000Z

@nlhuong will do, but I'll probably just skip the rgamma() call in that case so you should get the same results as if you'd called Poisson() instead. This seems reasonable in situations where you might want to vary alpha from the Poisson to some positive value of overdispersion; it'd be easy if NegBin handled that without the user needing to do if(alpha == 0) then Poisson(...) else NegBin(...)

Answer 5 · 2016-05-05T18:15:51.000Z

This issue was closed via 758c439

Answer 6 · 2016-05-06T05:45:22.000Z

The current implementation fails if users supplies a vector of alpha (which is allowed), and that vector contains zeros:

coenocliner:::NegBin(10, 10, c(1,0))
 [1] 31  0  9  0  8  0  1  0 14  0
Warning message:
In if (alpha < 0L) { :
  the condition has length > 1 and only the first element will be used

rgamma will use exact zero for every second item and 0 will be returned. The following would handle this and also silently handle negative alpha:

alpha <- pmax(alpha, sqrt(.Machine$double.eps))

Alternatively, we could force non-vector alpha, but I think it is good to allow vector arguments.

Answer 7 · 2016-05-09T02:55:05.000Z

The way NegBin() is implemented doesn't really work for alpha = 0. In those cases rgamma() returns 0 and then we have mu <- mu * 0, which is always 0, but in those cases it should be a standard Poisson with lambda = mu, where mu is the given mu.

I'm checking a different implementation now. However, we could always switch to rnbinom(n, mu = mu, size = 1 / alpha), which gives the same output as NegBin(n, mu = mu, alpha = alpha) (in my current branch) and rpois(n, lambda= mu).