R-squared for Dirichlet regression (`r2`)
MarcRieraDominguez opened this issue · 1 comments
Hi! First of all, thank you for creating and maintaining this package!
I have come across an unexpected behaviour when applying r2()
to Dirichlet regression fitted with the DirichletReg
package. In short, the Dirichlet regression extends the beta regression to C categories: bounded responses (0, 1) across more than 2 categories. This regression comes in two parametrizations: common (a separate model is fitted to each of the C categories) vs alternative (a separate model is fitted to C-1 categories, and precision is modelled separately). Each model can use a different set of explanatory variables, separated by pipes |
.
r2()
appears to return Nagelkerke's R2, but the value is very high for models with the alternative parametrization. For instance, a value close to 0.9, when the squared correlation between fitted and observed values is no higher than 0.75 for any category. The value for a model with the common parametrization is more sensible (i.e. in line with the correlations between fitted and observed values). I suspect this has to do with how a null model is declared, based on comparisons with MuMIn::rsquaredLR()
. A reproducible example is available in an issue over at the DirichletReg
package.
I am not an expert, so perhaps the r2 values actually make sense. The analysis of proportions across categories is quite interesting, and given a recent review (https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13234) its popularity might increase in ecology and evolution. If performance
can work with such models it would be a very useful extension!
Thank you!
I agree this will be a super interesting area to follow in the coming years. Worth noting that it's also possible to model with brms
and the dirichlet
family - would that still necessitate a new r2
method?
Using brms
also allows specifying random effects (as far as I can tell). That would allow testing e.g. consistency of time budgets for individuals if icc()
or variance_ratio()
can handle those cases too. I haven't tested, so I'm not sure whether this might already be the case?