/reproducible-research

A Reproducible Data Analysis Workflow with R Markdown, Git, Make, and Docker

Primary LanguageTeXCreative Commons Attribution 4.0 InternationalCC-BY-4.0

This is the accompanying GitHub repository to a work in progress paper by Aaron PeikertORCID iD and Andreas Brandmaier ORCID iD.

licensebuttons by Ask Me Anything ! Open Source Love

Abstract

In this tutorial, we describe a workflow to ensure long-term reproducibility of R-based data analyses. The workflow leverages established tools and practices from software engineering. It combines the benefits of various open-source software tools including R Markdown, Git, Make, and Docker, whose interplay ensures seamless integration of version management, dynamic report generation conforming to various journal styles and full cross-platform and long-term computational reproducibility. The workflow ensures meeting the primary goals that 1) the reporting of statistical results is consistent with the actual statistical results (dynamic report generation), 2) the analysis exactly reproduces at a later time even if the computing platform or software is changed (computational reproducibility), and 3) changes at any time (during development and post-publication) are tracked, tagged, and documented while earlier versions of both data and code remain accessible. While the research community increasingly recognizes dynamic document generation and version management as tools to ensure reproducibility, we demonstrate with practical examples that these alone are not sufficient to ensure long-term computational reproducibility. Leveraging containerization, dependence management, version management, and literate programming, the workflow increases scientific productivity by facilitating later reproducibility and reuse of code and data.

Compile

Usual Way

Requires: Git, RStudio, pandoc, pandoc-citeproc & rmarkdown.

Open RStudio -> File -> New Project -> Version Control -> Git

Insert:

https://github.com/aaronpeikert/reproducible-research.git

Open manuscript.Rmd click on Knit.

Using Workflow

Does not require R or RStudio, but make & docker.

Execute in Terminal:

git clone https://github.com/aaronpeikert/reproducible-research.git
cd reproducible-research
make build
make all DOCKER=TRUE

Note: Windows user need to manually edit the Makefile and set current_path to the current directory and use make all DOCKER=TRUE WINDOWS=TRUE. We hope that future releases of Docker for Windows will not require that workaround.

Rebuild Everything

In case you experience some unexpected behavior with this workflow, you should check that you have the most recent version (git pull), rebuild the docker image (make build) and force the rebuild of all targets (make -B DOCKER).

git pull && make build && make -B DOCKER=TRUE

Session Info

sessioninfo::session_info()
## ─ Session info ──────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.1 (2019-07-05)
##  os       Debian GNU/Linux 9 (stretch)
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       Etc/UTC                     
##  date     2019-11-12                  
## 
## ─ Packages ──────────────────────────────────────────────────────────────
##  package     * version date       lib source        
##  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.1)
##  backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.1)
##  cli           1.1.0   2019-03-19 [1] CRAN (R 3.6.1)
##  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.1)
##  digest        0.6.22  2019-10-21 [1] CRAN (R 3.6.1)
##  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.1)
##  here        * 0.1     2017-05-28 [1] CRAN (R 3.6.1)
##  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.1)
##  knitr         1.25    2019-09-18 [1] CRAN (R 3.6.1)
##  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.1)
##  Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.1)
##  rlang         0.4.1   2019-10-24 [1] CRAN (R 3.6.1)
##  rmarkdown     1.16    2019-10-01 [1] CRAN (R 3.6.1)
##  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.1)
##  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.1)
##  stringi       1.4.3   2019-03-12 [1] CRAN (R 3.6.1)
##  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.1)
##  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.1)
##  xfun          0.10    2019-10-01 [1] CRAN (R 3.6.1)
##  yaml          2.2.0   2018-07-25 [1] CRAN (R 3.6.1)
## 
## [1] /usr/local/lib/R/site-library
## [2] /usr/local/lib/R/library