This is an overview of R packages and functions for fitting different types of regression models. For each row, the upper cells in the last column (packages and functions) refer to “simple” models, while the lower cells refer to their mixed models counterpart (if available and known).
This overview raises no claims towards completeness of available modelling packages. Rather, it shows commonly or more often used packages, but there a plenty of other packages as well (that might even perform better in doing those mentioned tasks - if you’re aware of such packages or think that an important package or function is missing, please file an issue).
Nature of Response | Example | Type of Regression | R package or function | Example Webpage | Bayesian with brms |
---|---|---|---|---|---|
Continuous | Quality of Life, linear scales | linear | lm() |
brm(family = gaussian()) |
|
- lmer() - glmmTMB() |
|||||
Binary | Success yes/no | binary logistic | glm(family=binomial) |
UCLA | brm(family = binomial()) |
- glmer(*) - glmmTMB(*) |
|||||
Binary, weighted | Success yes/no, with weights | quasi-binary logistic | glm(family=quasibinomial) |
||
glmmPQL(family="quasibinomial") |
|||||
Trials (or proportions of counts) | 20 successes out of 30 trials | logistic | glm(cbind(successes, failures), family=binomial) |
Hadley’s notes | brm(successes | trials(total), family = binomial()) |
- glmer(*) - glmmTMB(*) |
|||||
Count data | Number of usage, counts of events | Poisson | glm(family=poisson) |
UCLA | brm(family = poisson()) |
- glmer(*) - glmmTMB(*) |
|||||
Count data, with excess zeros or overdispersion | Number of usage, counts of events (with higher variance than mean of response) | negative binomial | glm.nb() |
UCLA | brm(family = negbinomial()) |
- glmer.nb() - glmmTMB(family=nbinom) |
|||||
Count data with very many zeros (inflation) | see count data, but response is modelled as mixture of Bernoulli & Poisson distribution (two sources of zeros) | zero-inflated | zeroinfl() |
UCLA | brm(family = zero_inflated_poisson()) |
glmmTMB(ziformula, family=poisson) |
|||||
Count data, with very many zeros (inflation) and overdispersion | Number of usage, counts of events (with higher variance than mean of response) | zero-inflated negative binomial | zeroinfl(dist="negbin") |
UCLA | brm(family = zero_inflated_negbinomial()) |
glmmTMB(ziformula, family=nbinom) |
|||||
Count data, zero-truncated | see count data, but only for positive counts (hurdle component models zero-counts) | hurdle (Poisson) | hurdle() |
UCLA | brm(family = hurdle_poisson()) |
glmmTMB(family=truncated_poisson) |
|||||
Count data, zero-truncated and overdispersion | see “Count data, zero-truncated”, but with higher variance than mean of response | hurdle (neg. binomial) | vglm(family=posnegbinomial) |
UCLA | brm(family = hurdle_negbinomial()) |
glmmTMB(family=truncated_nbinom) |
|||||
Proportion / Ratio (without zero and one) | Percentages, proportion of continuous data | Beta (see note below) | betareg() |
ouR data generation | brm(family = Beta()) |
glmmTMB(family=beta_family) |
|||||
Proportion / Ratio (including zero and one) | Percentages, proportions of continuous data | Beta-Binomial, zero-inflated Beta, ordered Beta (see note below) | - BBreg() - betabin() - vglm(family=betabinomial) - ordbetareg() |
ouR data generation | brm(family = zero_one_inflated_beta()) |
- glmmTMB(ziformula, family=beta_family) - glmmTMB(ziformula, family= betabinomial) - glmmTMB(ziformula, family= ordbeta) - ordbetareg() |
|||||
Ordinal | Likert scale, worse/ok/better | ordinal, proportional odds, cumulative | - polr() - clm() - bracl() |
UCLA | brm(family = cumulative()) |
- clmm() - mixor() - MCMCglmm(family = "ordinal") |
|||||
Multinomial | No natural order of categories, like red/green/blue | multinomial | - multinom() - brmultinom() |
UCLA | brm(family = multinomial()) |
MCMCglmm(family = "multinomial") |
|||||
Continuous, right-skewed | Financial data, reaction times | Gamma | glm(family=Gamma) |
Sean Anderson | brm(family = Gamma()) , but see also Reaction time distributions in brms |
- glmer(*) - glmmTMB(*) |
|||||
(Semi-)Continuous, (right) skewed, probably with spike at zero (zero-inlfated) | Financial data, probably exponential dispersion of variance | Tweedie | - glm(family=tweedie) - cpglm() |
Revolutions | |
- cpglmm() - glmmTMB(*) |
|||||
(Semi-)Continuous, (right) skewed, probably with spike at zero (zero-inlfated) | Normal distribution, but negative values are censored and stacked on zero | Tobit | - tobit() - censReg() |
brm(y | cens(), family = gaussian()) |
|
semLme() |
|||||
Continuous, but truncated or outliers | truncated | - censReg() - tobit() - vglm(family=tobit) |
UCLA-1, UCLA-2 | brm(y | trunc(), family = gaussian()) |
|
Continuous, but exponential growth | log-transformed, non-linear | - glm(family=Gaussian("log") - nls() |
Some useful equations, linear vs. non-linear regression | ||
- glmmTMB(*) - nlmer() - nlme() |
|||||
Proportion / Ratio with more than 2 categories | Biomass partitioning in plants (ratio of leaf, stem and root mass) | Dirichlet | DirichReg() |
brm(family = dirichlet()) |
|
Time-to-Event | Survival-analysis, time until event/death occurs | Cox (proportional hazards) | coxph |
UCLA | brm(family = cox()) |
coxme() |
-
*
indicates that for the mixed models functions the same response-type and family should be used as for theirglm
counterpart. -
Note that ratios or proportions from count data, like
cbind(successes, failures)
, are modelled as logistic regression withglm(cbind(successes, failures), family=binomial())
, while ratios from continuous data (where the response ranges from zero to one) are modelled using beta-regression. -
Usually, zero-inflated models are used when 0 or 1 come from a separate process or category. However, when the 0/1 values are most consistent with censoring rather than with a separate category/process, the ordered beta regression is probably a better choice (i.e., 0 mean “below detection”, not “something qualitatively different happened”) (Source: https://twitter.com/bolkerb/status/1577755600808775680)
- Base R:
lm()
,glm()
- AER:
tobit()
- aod:
betabin()
- betareg:
betareg()
- brglm2:
bracl()
,brmultinom()
- censReg:
censReg()
- cplm:
cpglm()
- coxph:
coxph()
- DirichletReg:
DirichReg()
- HRQoL:
BBreg()
- MASS:
glm.nb()
,polr()
- nnet:
multinom()
- ordbetareg:
ordbetareg()
- ordinal:
clm()
,clm2()
- pscl:
zeroinfl()
,hurdle()
- statmod:
tweedie()
- VGAM:
vglm()
- cplm:
cpglmm()
- coxme:
coxme()
- glmmTMB:
glmmTMB()
- lme4:
lmer()
,glmer()
,glmer.nb()
- MASS:
glmmPQL()
- MCMCglmm:
MCMCglmm()
- mixor:
mixor()
- ordbetareg:
ordbetareg()
- ordinal:
clmm()
,clmm2()
- smicd:
semLme()
- brms:
brm()
There is a handout in PDF-format.