Phyloglm produces different results with normalized and raw predictor variables
Opened this issue · 0 comments
Hi,
I have one variable (raw
) and its rescaled version from 0 to 10 (normalized
), transformed through min-max normalization formula. normalized
= (raw i- min(raw)) / (max(raw) - min(raw)).
Since raw
and normalized
are a linear re-parameterizations of each other, I expected the same AIC from fitting a binary phylogenetic logistic regression through phyloglm
with each of them as predictor and presence
as a response:
raw <- phyloglm(presence ~ raw, data = git, phy = git.tree, method = "logistic_MPLE", btol = 30) normalized <- phyloglm(presence ~ normalized, data = git, phy = git.tree, method = "logistic_MPLE", btol = 30)
However, the two models have a very different AIC
raw$aic: 1168.742
normalized$aic: 1112.437
As a comparison, if a run a non-phylogenetic regression with glm
I get the same AIC (1158).
glm(presence~raw, data = git, family = "binomial")
glm(presence~normalized, data = git, family = "binomial")
This difference in the AIC of the two models (raw and normalized) is problematic when I try to compare raw
and normalized
with some other predictor (let's call it other
) since I get the weird situation in which, for example, raw
is a better predictor than other
, but normalized
is worse (when, as far I can understand, they should have the same perfomance).
I don't know what I am missing, but if someone want to the explore data they are available on my drive at this link
https://drive.google.com/file/d/192lPTVECtIZZHkc7hhwyvXYDBwL5kVvP/view?usp=drive_link
Thank you so much,