/rethinking-cmdstanr

Reproducing the analysis in Richard McElreath's Statistical Rethinking with CmdStanR.

Primary LanguageR

rethinking-cmdstanr

This is an R project for reproducing and exploring the model fitting and analysis code shown in Richard McElreath's book Statistical Rethinking using CmdStanR and the tidyverse.

Setting up the R environment from scratch

This shows how to set up an R environment from scratch so that it is easy to restore. The current version of renv appears to have issues working with third party package repositories, which is how cmdstanr is normally installed. The easiest solution for now is to install cmdstanr from its GitHub repo, as renv::install seems to be able to track GitHub packages without any issues.

Create a bare R environment.

renv::init(bare = TRUE)

Install tidyverse packages.

renv::install("dplyr")
renv::install("ggplot2")
renv::install("purrr")
renv::install("readr")
renv::install("tidyr")
renv::install("here")

Install cmdstanr.

renv::install("stan-dev/cmdstanr")

Install rethinking.

renv::install("rmcelreath/rethinking")

Snapshot the library.

renv::snapshot()

Restoring the R environment from the lockfile

Activate the environment.

renv::activate()

Restore the environment.

renv::restore()

How this project is organised

Code files

For each section in the book that explores a particular analysis, there is an R script in the R directory and Stan model definitions in the stan directory. The textbook uses a hierarchical numbering system for sections, and the code files are numbered with smallest encompassing section number that covers the relevant material. Models are given meaningful but short names that summarise their goal.

Stan model definitions

There are more Stan model definitions in this project than there are models that appear in the book. For some examples in the book, the same model is defined in several different ways in Stan. This is so that I can explore different aspects of Stan.

In some cases I have defined the same model in two or more ways that are equivalent, but that offer different interfaces to the R code. In other cases I am using Stan to generate additional output that I need in the analysis. In the first half of his book, McElreath uses his own implementation of quadratic approximation to fit Bayesian models and to sample and simulate from the posterior distribution. Stan can do this sampling and simulation for you, so rather than trying to emulate the book's handling of posterior samples in R, it sometimes makes more sense to use these features of Stan. Where a model has been augmented to generate additional data for use in the analysis, the model name has a suffix that contains gen.

Links

Statistical Rethinking

CmdStanR

Stan

Bayesian R Packages