Russel88/DOC

Fns values differ from matlab

Closed this issue · 6 comments

Hi,

I was working with your functions trying to find why Fns value from DOC function differ from matlab. In my case, matlab gives a fns = 0.90494, while a summary of fns from DOC function
Fns Data
Min. :0.0004263 Real:200
1st Qu.:0.0622336
Median :0.0901535
Mean :0.0933056
3rd Qu.:0.1154092
Max. :0.7046036

I think that the mistake is in the DOC.boot function at ## Detect negative slope section. If I change point <- which(slope>0) to point <- which.max(slope>0) it almost mimic the matlab, because I obtain a fns = 0.9407. This change tries to do the same that the find(LocalSlope>0,1,'last')), which, if I am not wrong, should pick the last value (or max in R notation).
I run R with a set.seed(100), and run doc<-DOC(otus,R=200), and then:
xs <- seq(0,1,by=0.001)

Lowess

LOW <- loess(rJSD~Overlap,data=doc$DO,span=0.2,degree=1,family="symmetric",iterations=4,surface="interpolate")
LOW.P <- data.frame(predict(LOW,xs))

## Detect negative slope
# Smooth prediction
ma <- function(x,n=5){filter(x,rep(1/n,n), sides=2)}
low.ma <- ma(LOW.P[,1])
slope <- diff(low.ma)/diff(xs)
point <- which.max(slope>0) ### modified version
neg.slope <- xs[point[length(point)]]
    # Fns
Fns <- sum(doc$DO$Overlap>neg.slope,na.rm=T)/length(doc$DO$Overlap)

otus.txt

Hi,

This seems odd. I checked different datasets to make sure results were consistent. Have you checked whether your change provides the same results for the dataset the authors provide?

They write in their method section that the change point (what I call neg.slope) is a point where at all Overlap values higher than this the slope is negative. In my code I find all Overlap values at which the slope is positive, and then pick the last one, which I think is consistent with what they write in their method section. With your change the first point with a positive slope is the change point.

Hi,
I have run doc from matlab with the attached dataset and matlab functions and scripts and found that
Fns=0.47837, change_point=0.5938 and xc=0.5999. xc is used to calculate Fns within matlab as Fns=sum(Overlap_Vec>xc)/length(Overlap_Vec).
With the same dataset (set.seed(100)) and run DOC function and found that the nm value (median of negative slope) is 0.7205, and a summary of FNS values gave
Fns Data
Min. :0.0008163 Real:100
1st Qu.:0.0183673
Median :0.1620408
Mean :0.1985388
3rd Qu.:0.3500000
Max. :0.8448980

If I run with the output of DOC function this code:
xs <- seq(0,1,by=0.001)
LOW <- loess(rJSD~Overlap,data=doc$DO,span=0.2,degree=1,family="symmetric",iterations=4,surface="interpolate")
LOW.P <- data.frame(predict(LOW,xs))
ma <- function(x,n=5){filter(x,rep(1/n,n), sides=2)}
low.ma <- ma(LOW.P[,1])
slope <- diff(low.ma)/diff(xs)
point <- which.max(slope>0)
change_point_real <- xs[point[length(point)]] ### is 0.57
Fns_real<- round(sum(doc$DO$Overlap>change_point_real,na.rm=T)/length(doc$DO$Overlap),digits=5)

I found that Fns_real=0.55673 and change_point_real = 0.577, which it is relatively closer to matlab output. However, if I run point<-which(slope>0) I get Fns = 0.43347 and change_point=0.613.
Curiously, this do not occur with my dataset (otus.txt, attached in my first post). Anyway, I do not know why FNS is not giving anything similar to matlab or the calculations written above.

Regards
Manuel

matlab_functions.zip

OTUtable.txt

When I run your dataset through the code without bootstrap, I get the following:
FNS: 0.433
Neg.slope: 0.613
Which is relatively close to the matlab output.
It appears to be the bootstraps that changes the results

So, what can be the solution?. I could go without bootstraps to get the FNS and neg.slope and then go with bootstraps (real and null data) to get the P-value as in the matlab code? How can I go without bootstraps? I have tried using R=0 or R=1 or with subr=1000 or subr=900, but the function did not work in both cases. I am using the otus.txt file.

Hi,

Sorry about the late reply.

Yeah, that seems like a fine approach.

The subr argument specifies how many samples that are randomly sampled. It's an alternative to bootstrap and cannot be more than the number of samples you have.

You can trick the function to do it without bootstrap to set subr to the number of samples in the otu table, and then set R=2 (Every repetition is similar, because it's not doing bootstraps, but you cannot set R=1 because the function cannot handle this edge case)

Thank you very much for your help.