easystats/performance

R-squared for Dirichlet regression (`r2`)

MarcRieraDominguez opened this issue · 1 comments

Hi! First of all, thank you for creating and maintaining this package!

I have come across an unexpected behaviour when applying r2() to Dirichlet regression fitted with the DirichletReg package. In short, the Dirichlet regression extends the beta regression to C categories: bounded responses (0, 1) across more than 2 categories. This regression comes in two parametrizations: common (a separate model is fitted to each of the C categories) vs alternative (a separate model is fitted to C-1 categories, and precision is modelled separately). Each model can use a different set of explanatory variables, separated by pipes |.

r2() appears to return Nagelkerke's R2, but the value is very high for models with the alternative parametrization. For instance, a value close to 0.9, when the squared correlation between fitted and observed values is no higher than 0.75 for any category. The value for a model with the common parametrization is more sensible (i.e. in line with the correlations between fitted and observed values). I suspect this has to do with how a null model is declared, based on comparisons with MuMIn::rsquaredLR(). A reproducible example is available in an issue over at the DirichletReg package.

maiermarco/DirichletReg#12

I am not an expert, so perhaps the r2 values actually make sense. The analysis of proportions across categories is quite interesting, and given a recent review (https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13234) its popularity might increase in ecology and evolution. If performance can work with such models it would be a very useful extension!

Thank you!

I agree this will be a super interesting area to follow in the coming years. Worth noting that it's also possible to model with brms and the dirichlet family - would that still necessitate a new r2 method?

Using brms also allows specifying random effects (as far as I can tell). That would allow testing e.g. consistency of time budgets for individuals if icc() or variance_ratio() can handle those cases too. I haven't tested, so I'm not sure whether this might already be the case?