/projpred

Projection predictive variable selection

Primary LanguageR

Stan Logo

projpred

Build Status CRAN_Status_Badge

An R package to perform projection predictive variable selection for generalized linear models. Compatible with rstanarm and brms but other reference models can also be used.

The method is described in detail in Piironen et al. (2018) and evaluated in comparison to many other methods in Piironen and Vehtari (2017).

Currently, the supported models (family objects in R) include Gaussian, Binomial and Poisson families. See the quickstart-vignette for examples.

Resources

Installation

  • Install the latest release from CRAN:
install.packages('projpred')
  • Install latest development version from GitHub (requires devtools package):
if (!require(devtools)) {
  install.packages("devtools")
  library(devtools)
}
devtools::install_github('stan-dev/projpred', build_vignettes = TRUE)

Example

rm(list=ls())
library(projpred)
library(rstanarm)
options(mc.cores = parallel::detectCores())
set.seed(1)

# Gaussian and Binomial examples from the glmnet-package
data('df_gaussian', package = 'projpred')
#data('df_binom', package = 'projpred')

# fit the full model with a sparsifying prior
fit <- stan_glm(y ~ x, family = gaussian(), data = df_gaussian,
                prior = hs(df = 1, global_scale=0.01), iter = 500, seed = 1)
#fit <- stan_glm(y ~ x, family = binomial(), data = df_binom
#                prior = hs(df = 1, global_scale=0.01), iter = 500, seed = 1)


# perform the variable selection
vs <- varsel(fit)

# print the results
varsel_stats(vs)

# project the parameters for model sizes nv = 3,5 variables 
projs <- project(vs, nv = c(3, 5))

# predict using only the 5 most relevant variables
pred <- proj_linpred(vs, xnew=df_gaussian$x, nv=5, integrated=T)

# perform cross-validation for the variable selection
cvs <- cv_varsel(fit, cv_method='LOO')

# plot the validation results 
varsel_plot(cvs)

References

Dupuis, J. A. and Robert, C. P. (2003). Variable selection in qualitative models via an entropic explanatory power. Journal of Statistical Planning and Inference, 111(1-2):77–94.

Goutis, C. and Robert, C. P. (1998). Model choice in generalised linear models: a Bayesian approach via Kullback–Leibler projections. Biometrika, 85(1):29–37.

Piironen, Juho and Vehtari, Aki (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3):711-735. doi:10.1007/s11222-016-9649-y. (online).

Piironen, Juho, Paasiniemi, Markus and Vehtari, Aki (2018). Projective inference in high-dimensional problems: prediction and feature selection. (preprint).