/reproducible-research

A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker

Primary LanguageTeXCreative Commons Attribution 4.0 InternationalCC-BY-4.0

This is the accompanying GitHub repository to a work in progress paper by Aaron PeikertORCID iD and Andreas M. Brandmaier ORCID iD.

licensebuttons by Ask Me Anything ! Open Source Love

Abstract

In this tutorial, we describe a workflow to ensure long-term reproducibility of R-based data analyses. The workflow leverages established tools and practices from software engineering. It combines the benefits of various open-source software tools including R Markdown, Git, Make, and Docker, whose interplay ensures seamless integration of version management, dynamic report generation conforming to various journal styles and full cross-platform and long-term computational reproducibility. The workflow ensures meeting the primary goals that 1) the reporting of statistical results is consistent with the actual statistical results (dynamic report generation), 2) the analysis exactly reproduces at a later point in time even if the computing platform or software is changed (computational reproducibility), and 3) changes at any time (during development and post-publication) are tracked, tagged, and documented while earlier versions of both data and code remain accessible. While the research community increasingly recognizes dynamic document generation and version management as tools to ensure reproducibility, we demonstrate with practical examples that these alone are not sufficient to ensure long-term computational reproducibility. Combining containerization, dependence management, version management, and dynamic document generation, the proposed workflow increases scientific productivity by facilitating later reproducibility and reuse of code and data.

Resources

Tool How to install? How to learn?
Windows only:
Chocolately
Visit chocolatey.org. Chocolately installs software for you, it is installed and called from the terminal/command prompt.
To open the comand prompt, press Windows+X and then click on “Command Prompt” or “Command Prompt (Admin).”
OS X only:
Homebrew
Visit brew.sh. Homebrew installs software for you. It is installed and called from the terminal/command prompt.
To open the terminal press Command + Space to open Spotlight and then type “Terminal” and double click on the top search result.
R Windows:
Use Chocolately (from the terminal).
choco install -y r.project

OS X:
Use Homebrew.
brew install r
Read: R for Data Science
Rstudio Windows:
Use Chocolately (from the terminal).
choco install -y r.studio

OS X:
Use Homebrew (from the terminal).
brew cask install rstudio
Skim the cheatsheet
rmarkdown Within Rstudio, type into the R-console:
install.packages("rmarkdown")
Read the cheatsheet. Skim R Markdown: The Definitive Guide
Git Windows:
Use Chocolately (from the terminal).
choco install -y git

OS X:
Git gets installed with Homebrew.
Nothing to do.
Read Part IV Git fundamentals And skim the rest of Happy Git and Gitub for the useR.
GitHub Create an account on: github.com
And apply for Student/Researcher Benefits
Read Part II Connect Git, GitHub, RStudio And III Early GitHub Wins.
Make Windows:
Use chocolately.
choco install -y make

OS X:
Make is preinstalled on OS X.
Nothing to do.
Read Minimal Make
Docker Windows:
Use chocolately.
choco install -y docker-desktop

OS X:
Use Homebrew (from the terminal).
brew cask install docker

Linux:
Follow steps described in: Post-installation steps for Linux
Read An Introduction to Rocker: Docker Containers for R.

Compile

The following paragraphs describe how you can obtain a copy of the source files of our manuscript describing reproducible workflows, and create the PDF. Either, you can go the ‘standard’ way of downloading a local copy of the repository and knit the manuscript file in R, or you can use the reproducible workflow as suggested and use Make to create a container and build the final PDF file in exactly the same virtual computational environment that we used to render the PDF.

Standard Way

Requires: Git, RStudio, pandoc, pandoc-citeproc & rmarkdown.

Open RStudio -> File -> New Project -> Version Control -> Git

Insert:

https://github.com/aaronpeikert/reproducible-research.git

Open manuscript.Rmd click on Knit.

Using a Reproducible Workflow

Does not require R or RStudio, but make & docker.

Execute in Terminal:

git clone https://github.com/aaronpeikert/reproducible-research.git
cd reproducible-research
make build
make all DOCKER=TRUE

Note: Windows user need to manually edit the Makefile and set current_path to the current directory and use make all DOCKER=TRUE WINDOWS=TRUE. We hope that future releases of Docker for Windows will not require that workaround.

Rebuild Everything

In case you experience some unexpected behavior with this workflow, you should check that you have the most recent version (git pull), rebuild the docker image (make build) and force the rebuild of all targets (make -B DOCKER).

git pull && make rebuild && make -B DOCKER=TRUE

Session Info

sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.1 (2019-07-05)
##  os       Debian GNU/Linux 9 (stretch)
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Etc/UTC                     
##  date     2020-05-21                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date       lib source        
##  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.1)
##  backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.1)
##  cli           2.0.0   2019-12-09 [1] CRAN (R 3.6.1)
##  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.1)
##  digest        0.6.23  2019-11-23 [1] CRAN (R 3.6.1)
##  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.1)
##  fansi         0.4.0   2018-10-05 [1] CRAN (R 3.6.1)
##  glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.1)
##  here        * 0.1     2017-05-28 [1] CRAN (R 3.6.1)
##  hms           0.5.2   2019-10-30 [1] CRAN (R 3.6.1)
##  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.1)
##  knitr         1.26    2019-11-12 [1] CRAN (R 3.6.1)
##  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.1)
##  pander      * 0.6.3   2018-11-06 [1] CRAN (R 3.6.1)
##  pillar        1.4.3   2019-12-20 [1] CRAN (R 3.6.1)
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.1)
##  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.1)
##  Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.1)
##  readr       * 1.3.1   2018-12-21 [1] CRAN (R 3.6.1)
##  rlang         0.4.2   2019-11-23 [1] CRAN (R 3.6.1)
##  rmarkdown     2.0     2019-12-12 [1] CRAN (R 3.6.1)
##  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.1)
##  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.1)
##  stringi       1.4.3   2019-03-12 [1] CRAN (R 3.6.1)
##  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.1)
##  tibble        2.1.3   2019-06-06 [1] CRAN (R 3.6.1)
##  vctrs         0.2.1   2019-12-17 [1] CRAN (R 3.6.1)
##  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.1)
##  xfun          0.11    2019-11-12 [1] CRAN (R 3.6.1)
##  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.1)
##  zeallot       0.1.0   2018-01-28 [1] CRAN (R 3.6.1)
## 
## [1] /usr/local/lib/R/site-library
## [2] /usr/local/lib/R/library