vegandevs/vegan

Issues in fisherfit function

Closed this issue · 3 comments

Hey, there. I am using vegan::fisher.alpha() function to calculate Fisher's alpha diversity indices. I checked out a similar issue here that you handled 2 years ago because I faced the same error. I spent a few hours checking this out. I think I found the reason from fisherfit function in vegan package, and I am writing this comment to share my humble opinion.

Below is the source code of fisherfit function.

function (x, ...) 
{
  nr <- as.fisher(x)
  S <- sum(nr) # number of species
  N <- sum(x)  # number of individuals
  d1fun <- function(x, S, N) x * log(1 + N/x) - S
  if (S > 0) {
    sol <- uniroot(d1fun, c(1, 50), extendInt = "upX", S = S, 
      N = N, ...)
    if (S == N) 
      warning("all species singletons: alpha arbitrarily high")
  }
  else {
    sol <- list(root = 0, iter = 0, estim.prec = NA)
  }
  nuisance <- N/(N + sol$root)
  out <- list(estimate = sol$root, hessian = NA, iterations = sol$iter, 
    df.residual = NA, nuisance = nuisance, fisher = nr, 
    estim.prec = sol$estim.prec, code = 2 * is.na(sol$estim.prec) + 
      1)
  class(out) <- "fisherfit"
  out
}

where S is the number of species, N is the number of samples (counts), d1fun is the formula $S=\alpha \ln(1+\frac{N}{\alpha})$, and what I have to do here is to find $\alpha$ (i.e., Fisher's alpha) that fits this equation using uniroot function above, right? To see the error message, I set x as the community counts data that I used for the calculation. Rows are taxa (OTUs), and the column is the sample.

> head(x, 10)
OTU Table:          [10 taxa and 1 samples]
                     taxa are rows
                  p002_Gop_R
meta_mOTU_v2_6946          0
meta_mOTU_v2_7279          0
ref_mOTU_v2_2557           0
ref_mOTU_v2_4459           0
ref_mOTU_v2_0138           0
ref_mOTU_v2_1395           0
ref_mOTU_v2_0643           0
ref_mOTU_v2_4724           0
meta_mOTU_v2_6237          0
ref_mOTU_v2_4236           0

In this data, N = 2970 and S = 2, meaning that there are 2970 individuals total from two species (14, and 2956 counts each), and the rest of them are all zeros.

> table(x)
x
   0   14 2956 
 628    1    1 

I put this information to d1fun and tried to find Fisher's alpha using uniroot function. It returned the error below.

> sol <- uniroot(d1fun, c(1, 50), extendInt = "upX", S = S, N = N)
Error in uniroot(d1fun, c(1, 50), extendInt = "upX", S = S, N = N) : 
  did not succeed extending the interval endpoints for f(lower) * f(upper) <= 0
In addition: Warning message:
In log(1 + N/x) : NaNs produced

So I guess that Fisher's alpha, in this case, is not included in this interval [1,50], and then I changed the argument extendInt = "yes" to allow the interval to be extended on both sides, not only one side. Then, I got Fisher's alpha 0.2091857 less than 1 without any error messages.

> sol <- uniroot(d1fun, c(1, 50), extendInt = "yes", S = S, N = N)
Warning message:
In log(1 + N/x) : NaNs produced
> sol
$root
[1] 0.2091857

$f.root
[1] 8.012779e-06

$iter
[1] 13

$init.it
[1] 8

$estim.prec
[1] 6.103516e-05

So my question would be,

  1. Is there any specific reason that you set extendInt = "upX" instead of extendInt = "yes" in uniroot function?
  2. If my explanation makes sense, is there any possibility that you change that argument and update the package?

Thank you for reading my comments. If there is something unclear from the comments or if I missed some points, please let me know.

The basic problem is that in some extreme cases data do not follow Fisher model, and then we get weird results.

The issue of extendInt is tricky and long. It was last changed because the strategy "yes" failed in dependency tests with package microbial. Here are some relevant changes:

  • 3bce687 changed to strategy "upX".
  • cb77deb not changed extendInt but detects some cases that break it.
  • 69b59dc changed to strategy "yes".
  • 4b16788 detect cases with 0 or 1 species which are doomed to fail.
  • b508a58 changed to strategy "upX".
  • 9469eac start to use extendInt with strategy "yes".

Quick second look: you have a community with 2 species and 2970 individuals. This really is a case where Fisher abundance distribution hardly is meaningful. However, it is a kind of case that appears especially with genomic data. If you only have two species in 3000 individuals, the diversity is indeed very low.

Please note that the extendInt does not define the direction of extending the argument: extendInt="upX" can extend to either direction. Choice "upX" only tells that the function is inreasing at the *root" (solution), and that information is passed to the function. This means that the limits should bracket the root, and with "upX" we know that the function should be negative at the lower limit. When it is positive, uniroot knows that only the lower limit should be extended, but there is no need to extend the upper limit if it is positive.

We can set trace in uniroot and in your example uniroot expands the limits like this:

> uniroot(d1fun, c(1,50), extendInt = "upX", S=2, N=2970, trace=2)
search in [1,50]
 .. modified lower: 0.99
 .. modified lower: 0.97
 .. modified lower: 0.93
 .. modified lower: 0.85
 .. modified lower: 0.69
 .. modified lower: 0.37
 .. modified lower: -0.27

Only the lower limit is extended, but the function takes too long steps and at estimate –0.27 function fails.

With strategy "yes" we do not supply the information that function is increasing at the root, and so both limits are extended in search of negative value of the function:

> uniroot(d1fun, c(1,50), extendInt = "yes", S=2, N=2970, trace=2)
search in [1,50]
 .. modified lower,upper: (           0.99,           50.5)
 .. modified lower,upper: (           0.97,           51.5)
 .. modified lower,upper: (           0.93,           53.5)
 .. modified lower,upper: (           0.85,           57.5)
 .. modified lower,upper: (           0.69,           65.5)
 .. modified lower,upper: (           0.37,           81.5)
 .. modified lower,upper: (           0.37,          113.5)
 .. modified lower,upper: (           0.05,          177.5)

Now both limits are extended, but function takes shorter steps below and the trial root does not stray to negative values and the estimation works as (0.05, 177.5) bracket the zero point (root).

So it is correct to have extendInt = "upX" because function is increasing at the root. However, the simplest idea is to reduce the lower limit so that the limits initially bracket the root:

> uniroot(d1fun, c(0.1,50), extendInt = "upX", S=2, N=2970, trace=2)

I'll have a look at this. First tests indicate that this does not impact "normal" cases. I need to test with extreme cases. You had one extreme case: large sample size, but only a couple of species. The opposite extreme case is to have low sample size with high number of species, in the most extreme cases all species being singletons.

Alright. I really appreciated your clarification. I needed some time to understand your comments, and to summarize,

  1. Extreme cases normally do not follow the Fisher model, so the function does not need to calculate every extreme case in detail.
  2. Changing the argument to extendInt="yes" fails other functions in other packages.
  3. Once setting the interval, the function should be negative at the lower limit and positive at the upper limit because the function is increasing at the root, and the signs should be different to have a root in the interval. extendInt="upX" assumes increasing at the root, but extendInt="yes" doesn't, which actually fails.