Issues in fisherfit function
Closed this issue · 3 comments
Hey, there. I am using vegan::fisher.alpha()
function to calculate Fisher's alpha diversity indices. I checked out a similar issue here that you handled 2 years ago because I faced the same error. I spent a few hours checking this out. I think I found the reason from fisherfit
function in vegan
package, and I am writing this comment to share my humble opinion.
Below is the source code of fisherfit
function.
function (x, ...)
{
nr <- as.fisher(x)
S <- sum(nr) # number of species
N <- sum(x) # number of individuals
d1fun <- function(x, S, N) x * log(1 + N/x) - S
if (S > 0) {
sol <- uniroot(d1fun, c(1, 50), extendInt = "upX", S = S,
N = N, ...)
if (S == N)
warning("all species singletons: alpha arbitrarily high")
}
else {
sol <- list(root = 0, iter = 0, estim.prec = NA)
}
nuisance <- N/(N + sol$root)
out <- list(estimate = sol$root, hessian = NA, iterations = sol$iter,
df.residual = NA, nuisance = nuisance, fisher = nr,
estim.prec = sol$estim.prec, code = 2 * is.na(sol$estim.prec) +
1)
class(out) <- "fisherfit"
out
}
where S is the number of species, N is the number of samples (counts), d1fun
is the formula uniroot
function above, right? To see the error message, I set x as the community counts data that I used for the calculation. Rows are taxa (OTUs), and the column is the sample.
> head(x, 10)
OTU Table: [10 taxa and 1 samples]
taxa are rows
p002_Gop_R
meta_mOTU_v2_6946 0
meta_mOTU_v2_7279 0
ref_mOTU_v2_2557 0
ref_mOTU_v2_4459 0
ref_mOTU_v2_0138 0
ref_mOTU_v2_1395 0
ref_mOTU_v2_0643 0
ref_mOTU_v2_4724 0
meta_mOTU_v2_6237 0
ref_mOTU_v2_4236 0
In this data, N = 2970 and S = 2, meaning that there are 2970 individuals total from two species (14, and 2956 counts each), and the rest of them are all zeros.
> table(x)
x
0 14 2956
628 1 1
I put this information to d1fun
and tried to find Fisher's alpha using uniroot
function. It returned the error below.
> sol <- uniroot(d1fun, c(1, 50), extendInt = "upX", S = S, N = N)
Error in uniroot(d1fun, c(1, 50), extendInt = "upX", S = S, N = N) :
did not succeed extending the interval endpoints for f(lower) * f(upper) <= 0
In addition: Warning message:
In log(1 + N/x) : NaNs produced
So I guess that Fisher's alpha, in this case, is not included in this interval [1,50], and then I changed the argument extendInt = "yes"
to allow the interval to be extended on both sides, not only one side. Then, I got Fisher's alpha 0.2091857 less than 1 without any error messages.
> sol <- uniroot(d1fun, c(1, 50), extendInt = "yes", S = S, N = N)
Warning message:
In log(1 + N/x) : NaNs produced
> sol
$root
[1] 0.2091857
$f.root
[1] 8.012779e-06
$iter
[1] 13
$init.it
[1] 8
$estim.prec
[1] 6.103516e-05
So my question would be,
- Is there any specific reason that you set
extendInt = "upX"
instead ofextendInt = "yes"
inuniroot
function? - If my explanation makes sense, is there any possibility that you change that argument and update the package?
Thank you for reading my comments. If there is something unclear from the comments or if I missed some points, please let me know.
The basic problem is that in some extreme cases data do not follow Fisher model, and then we get weird results.
The issue of extendInt
is tricky and long. It was last changed because the strategy "yes" failed in dependency tests with package microbial. Here are some relevant changes:
Quick second look: you have a community with 2 species and 2970 individuals. This really is a case where Fisher abundance distribution hardly is meaningful. However, it is a kind of case that appears especially with genomic data. If you only have two species in 3000 individuals, the diversity is indeed very low.
Please note that the extendInt
does not define the direction of extending the argument: extendInt="upX"
can extend to either direction. Choice "upX"
only tells that the function is inreasing at the *root" (solution), and that information is passed to the function. This means that the limits should bracket the root, and with "upX"
we know that the function should be negative at the lower limit. When it is positive, uniroot
knows that only the lower limit should be extended, but there is no need to extend the upper limit if it is positive.
We can set trace
in uniroot and in your example uniroot
expands the limits like this:
> uniroot(d1fun, c(1,50), extendInt = "upX", S=2, N=2970, trace=2)
search in [1,50]
.. modified lower: 0.99
.. modified lower: 0.97
.. modified lower: 0.93
.. modified lower: 0.85
.. modified lower: 0.69
.. modified lower: 0.37
.. modified lower: -0.27
Only the lower limit is extended, but the function takes too long steps and at estimate –0.27 function fails.
With strategy "yes"
we do not supply the information that function is increasing at the root, and so both limits are extended in search of negative value of the function:
> uniroot(d1fun, c(1,50), extendInt = "yes", S=2, N=2970, trace=2)
search in [1,50]
.. modified lower,upper: ( 0.99, 50.5)
.. modified lower,upper: ( 0.97, 51.5)
.. modified lower,upper: ( 0.93, 53.5)
.. modified lower,upper: ( 0.85, 57.5)
.. modified lower,upper: ( 0.69, 65.5)
.. modified lower,upper: ( 0.37, 81.5)
.. modified lower,upper: ( 0.37, 113.5)
.. modified lower,upper: ( 0.05, 177.5)
Now both limits are extended, but function takes shorter steps below and the trial root does not stray to negative values and the estimation works as (0.05, 177.5) bracket the zero point (root).
So it is correct to have extendInt = "upX"
because function is increasing at the root. However, the simplest idea is to reduce the lower limit so that the limits initially bracket the root:
> uniroot(d1fun, c(0.1,50), extendInt = "upX", S=2, N=2970, trace=2)
I'll have a look at this. First tests indicate that this does not impact "normal" cases. I need to test with extreme cases. You had one extreme case: large sample size, but only a couple of species. The opposite extreme case is to have low sample size with high number of species, in the most extreme cases all species being singletons.
Alright. I really appreciated your clarification. I needed some time to understand your comments, and to summarize,
- Extreme cases normally do not follow the Fisher model, so the function does not need to calculate every extreme case in detail.
- Changing the argument to
extendInt="yes"
fails other functions in other packages. - Once setting the interval, the function should be negative at the lower limit and positive at the upper limit because the function is increasing at the root, and the signs should be different to have a root in the interval.
extendInt="upX"
assumes increasing at the root, butextendInt="yes"
doesn't, which actually fails.