This repository aims to collect resources relating to workflow and tooling choices that promote reproducibility and best practice in data analysis and data science projects.
The resources have been organised as:
- R Packages
- Books
- Papers
- Blog Posts
- Talks and Videos
If you would like to make a contribution, I would be glad to include it. Please file an issue, submit a PR or email me on deanmarchiori@gmail.com
Package | About | Available on |
---|---|---|
drake | An R-focused pipeline toolkit for reproducibility and high-performance computing | CRAN |
ProjectTemplate | ProjectTemplate is a system for automating the thoughtless parts of a data analysis project | CRAN |
workflowr | A Framework for Reproducible and Collaborative Data Science | CRAN |
rrtools | Tools for Writing Reproducible Research in R | Github |
orderly | Lightweight Reproducible Reporting for R | CRAN |
fnmate | A function definition generator | Github |
dflow | Automatically setup a drake project | Github |
represtools | Basic utility functions to support reproducible research | CRAN |
starters | R Package for initializing projects for various R activities | Github |
targets | Function-oriented Make-like declarative workflows for R | Github |
Title | Authors | Year |
---|---|---|
Agile Data Science with R - A workflow | Edwin Thoen | 2020 |
What They Forgot to Teach You About R | Jennifer Bryan, Jim Hester | 2020 |
The Turing Way: A Handbook for Reproducible Data Science | Becky Arnold, Louise Bowler, Sarah Gibson, Patricia Herterich, Rosie Higman, Kirstie Whitaker | 2019 |
Title | Citation |
---|---|
Packaging Data Analytical Work Reproducibly Using R (and Friends) | Ben Marwick, Carl Boettiger & Lincoln Mullen (2018) Packaging Data Analytical Work Reproducibly Using R (and Friends), The American Statistician, 72:1, 80-88, DOI: 10.1080/00031305.2017.1375986 |
Opinionated analysis development | Parker H. 2017. Opinionated analysis development. PeerJ Preprints 5:e3210v1 https://doi.org/10.7287/peerj.preprints.3210v1 |
- Benefits of a function-based diet (The {drake} post) - Miles McBain
- Structuring R Projects
- Using {drake} for Machine Learning
- That Feeling of Workflowing - Miles McBain
- Community Call - Reproducible Research with R
- RMarkdown Driven Development - Emily Riederer
- Community Call: Reproducible workflows at scale with drake
- Opinionated Analysis Development
- Will Landau - Reproducible Computation at Scale in R with Targets- New York Open Statistical Programming Meetup from December 2020
- How reproducible am I? A retrospective on a year of commercial data science projects in R - Dean Marchiori