lamho86/phylolm

Sign problem in phyloglmstep and phyloglm results

Closed this issue · 5 comments

Hello,

When I looked at the summary information of phyloglmstep and phyloglm (method = "logistic_IG10"), I found that the plus and minus signs before the Estimate were opposite, while values and other information stayed the same. Since I checked the two functions on the same data, the results should be the same. After checking the residuals and fitted values, I think the result of phyloglmstep is correct. The residuals of phyloglm were strange and didn't conform to the calculation of residuals. This problem wasn't found when using phylostep and phylolm. Hope you can check this problem.

Best,
Yuxin

Hello,

I couldn't replicate the error you mentioned. Could you please give a concrete example?

Thanks,
Lam

Hi Lam,

For example, I first do the stepwise for the trait "Sociality" with 14 predictors, than did the 'phyloglm' use the model coming from 'phyloglmstep'. The estimates of coefficients are opposite. And I think the results from phyloglmstep is the right one.

modelstep <- phyloglmstep(formula, starting.formula = NULL, data = glmData, phy=modelTree, 
                              method= "logistic_IG10", direction = "forward", trace = 2, 
                              btol = 20, log.alpha.bound = 4, start.beta=NULL, 
                              start.alpha=NULL, boot = 0, full.matrix = TRUE, k=2)
  print(summary(modelstep))

Step 5
Current model: x ~ 1 + Aspect + bio14 + bio15 + bio19
AIC(k=2): 134.376549065067
---END

Call:
phyloglm(formula = create.formula(plm), data = data, phy = phy, 
    method = method, btol = btol, log.alpha.bound = log.alpha.bound, 
    start.beta = start.beta, start.alpha = start.alpha, boot = boot, 
    full.matrix = full.matrix)
       AIC     logLik Pen.logLik 
    134.38     -61.19     -55.97 

Method: logistic_IG10
Mean tip height: 87.3
Parameter estimate(s):
alpha: 0.1095054 

Coefficients:
            Estimate   StdErr z.value   p.value    
(Intercept) -1.25739  0.29717 -4.2312 2.325e-05 ***
Aspect       1.01095  0.65364  1.5466  0.121949    
bio14        1.40107  0.53159  2.6356  0.008399 **
bio15        0.90962  0.39728  2.2896  0.022045 *
bio19        0.30048  0.30419  0.9878  0.323237    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Note: Wald-type p-values for coefficients, conditional on alpha=0.1095054


phyloGLM <- phyloglm(formula = f, data = glmData, phy = modelTree, 
                         method = "logistic_IG10", btol = 36, log.alpha.bound = 4)
  print(summary(phyloGLM))

Call:
phyloglm(formula = f, data = glmData, phy = modelTree, method = "logistic_IG10", 
    btol = 36, log.alpha.bound = 4)
       AIC     logLik Pen.logLik 
    134.38     -61.19     -55.97 

Method: logistic_IG10
Mean tip height: 87.3
Parameter estimate(s):
alpha: 0.1095054 

Coefficients:
            Estimate   StdErr z.value   p.value    
(Intercept)  1.25739  0.29717  4.2312 2.325e-05 ***
Aspect      -1.01095  0.65364 -1.5466  0.121949    
bio14       -1.40107  0.53159 -2.6356  0.008399 **
bio15       -0.90962  0.39728 -2.2896  0.022045 *
bio19       -0.30048  0.30419 -0.9878  0.323237    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Note: Wald-type p-values for coefficients, conditional on alpha=0.1095054

When we looked at the residuals and fitted values, it also reflected maybe the same problem. I check the residuals and fitted values by manual calculation using the coefficients from 'phyloglmstep', it seems that 'phyloglm' treats 0 as 1, and 1 as 0. So in the current situation, if we want the real residuals and fitted values we should do like this:

trait <- phyloGLM$residuals + phyloGLM$fitted.values # The order of residuals and fitted values does not match the original order. 
# In fact, in the previous version, the species name of residuals was indicated, 
# but in the latest version there's only data.
glmdatamrg <- data.frame(redisuals = phyloGLM$residuals, fitted = phyloGLM$fitted.values, 
                           trait = trait)
# Correct the residuals and fitted values (phylolm 2.6.2)
glmdatamrg$redisuals <- 0 - glmdatamrg$redisuals
glmdatamrg$trait <- 1 - glmdatamrg$trait
glmdatamrg$fitted <- glmdatamrg$trait - glmdatamrg$redisuals

The above problems came from method 'logistic_IG10'. Regarding the residuals and fitted values, problem was also found in method 'poisson_GEE'. After checking manually, I think it should be like this:

glmdatamrg$fitted = exp(phyloGLM$linear.predictors)
glmdatamrg$res <- lmdatamrg$X6.1_DietBreadth - lmdatamrg$fitted  # X6.1_DietBreadth is the dependent variable

I tried with the following simple example but couldn't replicate the problem. It may be data specific problem. Is your x a numeric variable or a categorical variable?

set.seed(123456)
tre = rcoal(60)
taxa = sort(tre$tip.label)
b0=0; b1=1;
x1 = rTrait(phy=tre,model="BM",
parameters=list(ancestral.state=0,sigma2=10))
x2 = rTrait(phy=tre,model="BM",
parameters=list(ancestral.state=0,sigma2=10))
x3 = rTrait(phy=tre,model="BM",
parameters=list(ancestral.state=0,sigma2=10))
X = cbind(rep(1,60), x1)
y = rbinTrait(n=1,phy=tre, beta=c(-1,0.5), alpha=1 ,X=X)
dat = data.frame(trait=y[taxa],pred1=x1[taxa],pred2=x2[taxa],pred3=x3[taxa])
fit = phyloglmstep(trait ~ pred1+pred2+pred3,data=dat,phy=tre,method="logistic_IG10",direction="both")
summary(fit)
fit0 = phyloglm(trait ~ pred1,data=dat,phy=tre,method="logistic_IG10")
summary(fit0)

x variables were normalized numeric data about ecology, like PETSeasonality, bio2, ....

I mean the response, which is x in this formula: Current model: x ~ 1 + Aspect + bio14 + bio15 + bio19

Did you use it as a numeric 0/1 value or as a factor? Would it be possible for you to provide a reproducible example?