/BayesModels

Easy Bayesian models.

Primary LanguageJulia

BayesModels.jl

BayesModels.jl is a package that uses Turing for Bayesian inference and StatsModels for its easy-to-use formula API.

Currently only Bayesian linear regression is supported, but I plan at some point to extend it to additional model types.

How to

You can run a Bayesian linear regression using blm(formula, dataframe).

# Import packages.
using BayesModels
using RDatasets
using DataFrames

# Load the wages dataframe. 
# I'm thinning out the dataframe to make it quicker,
# Bayesian methods don't scale as well with bigger data.
df = RDatasets.dataset("Ecdat", "Wages")[1:10:end, :]

# Estimate the model;
# Defaults to 1000 MCMC samples using NUTS
chain = blm(@formula(LWage ~ 1 + Married + Ed + Married*Ed), df)

# Display the results.
display(chain)

Custom priors for the intercept α and coefficients β, as well as any likelihood specific parameters, can be optionally passed as keyword arguments. Priors can be specified as either a Distribution or a function of the data (X,y) -> Distribution:

# use higher prior variance on intercept and coefficients
α_prior = Normal(0,2)
β_prior = (X,y) -> MvNormal(zeros(size(X,2)),2)
# use truncated Normal for observation variance
σ_prior = truncated(Normal(0,5),0,Inf)
chain = blm(@formula(LWage ~ 1 + Married + Ed + Married*Ed), df; α_prior=α_prior, β_prior=β_prior, σ_prior=σ_prior)

You can customize inference by explicitly setting the InferenceStrategy. The MCMC strategy takes a sampler algorithm as its first argument and passes all other arguments directly to sample:

df = RDatasets.dataset("Ecdat", "Wages")[1:10:end, :]
# Draw 500 samples from 4 chains in parallel using HMC instead of NUTS.
chain = blm(@formula(LWage ~ 1 + Married + Ed + Married*Ed), df, MCMC(HMC(0.05,10),MCMCThreads(),500,n_chains=4))

blm also supports Bernoulli, Exponential, and Poisson likelihoods for non-Normal response variables. By default, blm chooses from one of Normal, Bernoulli, or Poisson depending on whether the response variable is, respectively, real, binary-categorical, or integer typed:

df = dataset("Ecdat","HI")[sample(1:22265,500),:]
# will use Bernoulli likelihood with logistic link function by default (i.e. logistic regression)
# because HHI is a binary variable.
chain = blm(@formula(HHI ~ Whi + Education + Race + Region), df, MCMC(HMC(0.05,10),500))

You can also specify the likelihood type manually to override this behavior:

df = RDatasets.dataset("Ecdat","Airline")
# Note that this is for the sake of demonstration only... this model doesn't necessarily make sense.
chain = blm(Exponential, @formula(Output ~ 1 + Cost + LF*PF), df)

blm by default uses the canonical link functions for Normal, Poisson, and Bernoulli (i.e. identity, log, and logit) and a non-canonical absolute inverse abs(η)^-1 for Exponential. You can provide your own choice of link function by simply passing it as an additional argument in the trailing modelargs...:

# Uses softplus as the link function for the Exponential model;
# Need to add a small epsilon to ensure that the rate parameter > 0
g(x) = 1e-8 + softplus(x)
chain = blm(Exponential, @formula(Output ~ 1 + Cost + LF*PF), df, MCMC(NUTS(),1000), g)

If you find that the easy-bake, two-stage model from blm is too restrictive (and you probably will!), you can write your own Turing model and pass it directly to bayesreg:

# Simple 3-stage hierarchical regression model
@model function hlm(X,y,priors)
    N,K = size(X)
    σ_β ~ filldist(InverseGamma(3,2),K)
    σ_α ~ InverseGamma(3,2)
    σ ~ InverseGamma(2,3)
    α ~ Normal(0,σ_α)
    β ~ MvNormal(zeros(K),σ_β) # diagonal pooled variance
    η = α .+ x*β
    y ~ MvNormal(η, σ)
end
df = RDatasets.dataset("Ecdat", "Wages")[1:10:end, :]
chain = bayesreg(@formula(LWage ~ 1 + Married + Ed + Married*Ed), df, MCMC(NUTS(0.65),500), hlm)

The model must follow the form model(X,y,priors,args...) where args can be replaced with any number of arbitrary additional parameters. User-specified prior distributions can be passed in as keyword arguments and will show up in the priors named tuple parameter. Additional arguments to the model are passed directly through bayesreg:

myarg = 42
chain = bayesreg(@formula(LWage ~ 1 + Married + Ed + Married*Ed), df, MCMC(NUTS(0.65),500), myfancymodel, myarg)