Andrew Heiss, PhD • Brigham Young
University
May 14, 2019 • Utah County R Users Group
Download the slides from today’s talk
- The National Academies of Science’s recommendations for reproducibility
- Karl Broman’s tutorials on initial steps toward reproducible research
- Karl Broman’s talk “Steps toward reproducible research”
- Kirstie Whitaker’s “A how to guide to reproducible research”
Examples of each of these are included in this repository. Click on the big green “Clone or download” button at the top of the GitHub page and download the .zip file to follow along.
All code included in a single fairly well-commented file. Data is not included; must either be downloaded separately or obtained from authors.
Real life examples:
- Most R code out in the wild :)
How to use:
- Download and open
analysis.R
- Try to run it
- Install missing packages as needed
- Hope packages are the correct version
- Track down data if/when missing
All code is included in multiple well-commented files. Data is included
in data/
. Folder is structured as an RStudio
project.
Real life examples:
How to use:
- Download directory
- Open
02_code-data.Rproj
to open a new RStudio instance - Run
process-data.R
, thenfigure-1.R
, thenmodels_figure-2.R
(following the instructions in the project’s README file) - Install missing packages as needed (and hope they’re the right version)
Same as the previous example, except now everything is automated with a Makefile that runs the three R scripts in the correct order.
Real life examples:
Writing Makefiles goes beyond the scope of this little demonstration, but Karl Broman has some excellent resources and tutorials about how to use them.
How to use:
- Make sure you have access to GNU
make
. On macOS open Terminal (found in /Applications/Utilities/) and runxcode-select --install
. If you use Windows, check out this Stat 545 page - Download directory
- Open
03_code-data_makefile.Rproj
to open a new RStudio instance - Open the terminal panel in RStudio and type
make output
(go to Tools > Terminal > New terminal if you don’t have a terminal panel available already) - Install missing packages as needed (and hope they’re the right version)
Here we use a single R Markdown file to conduct the analysis. This literate programming approach lets you mix prose and code and creates a notebook for your analysis.
Real life examples:
How to use:
- Downlaod directory
- Open
04_Rmd-report.Rproj
to open a new RStudio instance - Open
provo-weather.Rmd
and click on the “Knit” button near the top of the source editor; wait for R to generate an HTML file - Install missing packages as needed (and hope they’re the right version)
Here we use R Markdown’s built-in website capabilities to generate a
static website from a collection of .Rmd
files. This allows you to
have a more complicated notebook with subpages that you can upload
anywhere online (your own private server, GitHub pages, etc.), or keep
locally on your computer.
See the R Markdown Websites documentation for complete details of this approach. Here’s the tl;dr version:
- Click on the “Build website” button in the Build panel in RStudio to build the website
- The generated site will be in
_site/
. Put this somewhere online if you want. _site.yml
controls what goes in the navigation bar controls other site generation settingsindex.Rmd
is the home page (it is required)- R will knit all
.Rmd
files in the root directory in alphabetical order. To ensure the order they’re knit in (i.e. if one depends on another), prefix them with numbers.- By default all the
.Rmd
files will share the same environment (i.e. if one file runslibrary(tidyverse)
, tidyverse functions will be available in the next file). If you don’t want this to happen (you don’t), make surenew_session: true
is set in_site.yml
, which makes each.Rmd
use a clean environment.
- By default all the
Real life examples:
- The Power of Ranking: The Ease of Doing Business Indicator as a Form of Social Pressure (website; GitHub)
- NGO Crackdowns and Philanthropy (website; GitHub)
- Why Donors Donate (website; GitHub)
- Are Donors Really Responding? Analyzing the Impact of Global Restrictions on NGOs (website; GitHub)
How to use:
- Download directory
- Open
05_Rmd-website.Rproj
to open a new RStudio instance - Click on “Build website” in the “Build” panel
- Navigate the preview that appears in RStudio or open
_site/index.html
in your browser - Install missing packages as needed (and hope they’re the right version)
Ben Marwick’s rrtools
package allows you to create a “research
compendium,” or a self-contained R package that includes your analysis,
data, R functions, and final paper that users can install with
devtools::install()
(or devtools::install_github()
if you have your
project hosted at GitHub).
Because the project is structured as a package, R will handle package
dependencies for you automatically. You can also include your commonly
used custom functions into the package, letting you include things like
library(myreproducibleproject)
or
myreproducibleproject::custom_function()
in your project.
Real life examples:
- Are Donors Really Responding? Analyzing the Impact of Global Restrictions on NGOs (website; GitHub)
- Why Donors Donate (website; GitHub)
To create your own compendium follow the instructions at the README. Here’s the tl;dr version:
- Run
library(rrtools)
- Run
create_compendium("nameofyourpackage")
- Open the new RStudio project that rrtools created
- Put your analysis in
analysis/
; put your data inanalysis/data/
; put your paper inanalysis/paper/
- Put custom functions in
R/
and use roxygen2 to document them - Use
library(nameofyourpackage)
to access your custom functions - Build your package by clicking on “Install and Restart” in the Build panel
In this example, I’ve put an R Markdown website in the analyses folder.
Since this project is already a package, the Build panel in R Studio is
configured to build a package, not a website. In order to build the
website, you’ll need to run rmarkdown::render_site()
. I’ve included
this in a Makefile in analysis/
, so you’ll need to open a terminal
panel and type cd analysis
, then make html
to generate the site.
How to use this example:
- Download directory
- Open
rrtools.Rproj
to open a new RStudio instance - Run
devtools::install(".", dependencies = TRUE)
to install the package and all its dependencies - Click on the Terminal panel in RStudio and type
cd analysis
- Type
make html
- Open
analysis/_site/index.html
in your browser
RStudio’s new (and still in-development) renv
package lets you
maintain a local project-specific library of packages, similar to
Python’s virtualenv
and pyenv
. The README for
renv
and the introduction
vignette explain how
it all works and how to get started. Here’s the tl;dr version:
- The
renv.lock
file contains a list of all the packages your project uses, with version number and hashes. Don’t edit this manually;renv
has functions that generate and update this for you - The
renv/activate.R
file is a script that tells R to userenv/library/*
when you runlibrary(blah)
. renv/library/
contains a local package structure.Rprofile
has a new line in it that runsrenv/activate.R
when you start a new R session.- If you use version control (or if you’re distributing this project
to others), you only need to track/include
renv.lock
,.Rprofile
, andrenv/activate.R
. Don’t include the contents ofrenv/library/
, since that is platform-specific and R will install packages there as needed.
How to use this example:
- Download directory
- Open
07_renv.Rproj
to open a new RStudio instance - Install
renv
withdevtools::install_github("rstudio/renv")
- Restart your R session (to make
.Rprofile
runrenv/activate.R
) - Wait as all dependencies are installed automatically
- Click on “Build website” in the “Build” panel
- Navigate the preview that appears in RStudio or open
_site/index.html
in your browser
Docker allows you to create virtual machines (or containers) and run
stuff in them. Containers are essentially miniature Linux computers with
different pieces of software pre-installed. They’re great for spinning
up computers with exact versions of R and packages. You can access R
within the containers through your browser—open a URL like
http://localhost:8787
to get to an RStudio instance within the
container.
Installing Docker and creating Dockerfiles goes beyond the scope of this little demonstration, but there are a ton of resources out there to get you started:
- Colin Fay’s “An Introudction to Docker for R Users”
- “A Docker tutorial for reproducible research”
- My “Super basic practical guide to Docker and RStudio”
The main advantage of creating reproducible Docker containers is that it essentially lets other users download and install a complete standalone computer that is configured exactly how it was when you ran your code. It’s like the gold standard of reproducibility.
The Rocker team has made this even more gold standardy for R projects too. They maintain base Docker images for each R version (3.5.1, 3.6.0, etc.), and these images are set up to install packages from MRAN (Microsoft’s snapshot-based mirror of CRAN). This means that if you use R 3.6.0, any packages you install will be at the version they were when R was released.
Real life examples:
- Are Donors Really Responding? Analyzing the Impact of Global Restrictions on NGOs (website; GitHub; Dockerfile)
How to use this example:
- Install Docker Desktop for Mac or Docker Desktop for Windows
- Install Kitematic if you want a GUI for managing Docker containers (you do)
- Download directory
- Navigate to the directory in a terminal and type
docker build -t myproject .
to build all the required pieces - Wait while everything gets downloaded
- Run
docker run -e PASSWORD=blah -p 8787:8787 myproject
to start the container - In your browser, go to http://localhost:8787. Log in using “rstudio” as the user name and “blah” as the password.
- Open
provo-weather.Rmd
and knit it
Binder is a more user-friendly version of the Docker approach to reproducibility. Instead of requiring users to install Docker and build the container image locally, Binder handles all the hosting and provides access to a specific version of R and RStudio in a browser.
It also provides a simpler way to install and configure packages—there’s no need for complicated Dockerfiles (you can still use them, but they’re not recommended). You need two extra files for this to work:
runtime.txt
, which contains the date for the MRAN snapshot that you want to use for package installation (formatted asr-YYYY-MM-DD
)install.R
, which contains R code for installing packages
Instructions and examples are here. Here’s the tl;dr version:
- Make sure your project is in its own public repository either at GitHub or GitLab
- Create a
runtime.txt
file andinstall.R
file in the root of your project - Go to Binder, paste your repository’s URL into the form, and click on “Launch”
- Wait for a looooong time (the binder container will rebuild every time you commit to the repository, which will also take a long time; if there are no commits, the container should open fairly quickly)
- Binder will give you a URL when it’s done. If you open the URL as
is, Binder will try to load your R files in a Jupyter notebook,
which won’t work. Append
?urlpath=rstudio
to the URL to open the project in an RStudio instance (e.g.https://mybinder.org/v2/gh/andrewheiss/binder-example/master?urlpath=rstudio
)
That’s all!
How to use this example (this example actually lives in a separate GitHub repository so that it can work with Binder):
- Go to https://mybinder.org/v2/gh/andrewheiss/binder-example/master?urlpath=rstudio
- Open
provo-weather.Rmd
and knit it