A curated list of reproducible research case studies, projects, tutorials, and media
- Case studies
- Ad-hoc reproductions
- Theory papers
- Tool reviews
- Courses
- Development Resources
- User tools
- Books
- Data Repositories
- Examples and exemplars
- Journals
- Ontologies
- Organizations
- Awesome Lists
The term "case studies" is used here in a general sense to describe any study of reproducibility. A reproduction is an attempt to arrive at comparable results with identical data using computational methods described in a paper. A refactor involves refactoring existing code into frameworks and other reproducibility best practices while preserving the original data. A replication involves generating new data and applying existing methods to achieve comparable results. A robustness test applies various protocols, workflows, statistical models or parameters to a given data set to study their effect on results, either as a follow-up to an existing study or as a "bake-off". A census is a high-level tabulation conducted by a third party. A survey is a questionnaire sent to practitioners. A case narrative is an in-depth first-person account. An independent discussion utilizes a secondary independent author to interpret the results of a study as a means to improve inferential reproducibility.
Study |
Field |
Approach |
Size |
Medicine |
Census |
80 studies |
|
Cancer biology |
Refactor |
8 studies |
|
Biostatistics |
Census |
56 studies |
|
Genetics |
Reproduction |
18 studies |
|
Software engineering |
Replication |
4 companies |
|
Signal processing |
Census |
134 papers |
|
Biomedical sciences |
Survey |
23 PIs |
|
Bioinformatics |
Census |
100 studies |
|
Cancer biology |
Replication |
53 studies |
|
Computer science |
Census |
613 papers |
|
Psychology |
Replication |
100 studies |
|
Biomedical sciences |
Census |
100 papers |
|
Epidemiology |
Robustness test |
417 variables |
|
Science |
Survey |
1,576 researchers |
|
NLP |
Replication |
3 studies |
|
Cancer biology |
Replication |
9 studies |
|
Biomedical sciences |
Census |
318 journals |
|
Science |
Case narrative |
31 PIs |
|
Biological sciences
|
Survey |
704 PIs |
|
Bioinformatics |
Refactor |
1 study |
|
Economics |
Replication |
18 studies |
|
Machine learning |
Census |
30 studies |
|
Archaeology |
Case narrative |
1 survey |
|
Comparative toxicogenomics |
Census |
51,292 claims in 3,363 papers |
|
Artificial intelligence |
Census |
400 papers |
|
Economics |
Census |
203 papers |
|
Computational science |
Reproduction |
204 articles, 180 authors |
|
Genomics |
Case narrative |
1 study |
|
Social sciences |
Replication |
21 papers |
|
Psychology |
Robustness test |
One data set, 29 analyst teams |
|
Medicine and health sciences |
Census |
30 papers |
|
Microbiome immuno oncology |
Replication |
1 paper |
|
Bioinformatics |
Refactor and test of robustness |
1 paper |
|
Biomedical Sciences |
Census |
149 papers |
|
Bioinformatics |
Synthetic replication & refactor |
1 paper |
|
Geosciences |
Survey, Reproduction |
146 scientists, 41 papers |
|
Reinforcement Learning |
Reproduction, case narrative |
1 paper |
|
Science & Engineering |
Survey |
215 participants |
|
Nephrology |
Robustness test |
1 paper |
|
Social sciences & other |
Census |
810 Dataverse studies |
|
Geosciences |
Survey |
360 papers |
|
Deep learning |
Robustness test |
1 analysis |
|
Genomics |
Case narrative |
1 analysis |
|
Pharmacogenomics |
Case narrative |
2 analyses |
|
Biomedical sciences and Psychology |
Census |
127 registered reports |
|
All |
Census |
1,159,166 Jupyter notebooks |
|
Virology |
Census |
236 papers |
|
Anaesthesia |
Indepedent discussion |
1 study |
|
Psychology |
Replication |
1 paper |
|
Cell pharmacology |
Robustness test |
5 labs |
|
Machine learning |
Reproduction |
18 conference papers |
|
Experimental archaeology |
Replication |
1 theory |
|
Neurology |
Census |
202 papers |
|
Psychology |
Replication |
2 experiments |
|
Ecology and Evolution |
Census |
163 papers |
|
Neuroimaging |
Robustness test |
1 data set, 70 teams |
|
Psychology |
Replication |
1 experiment, 21 labs, 2,220 participants |
|
Psychology |
Census |
62 papers |
|
Bioinformatics |
Robustness test |
1 data set |
|
Neurobiology |
Census |
41 papers |
|
Genetics |
Census |
1799 papers |
These are one-off unpublished attempts to reproduce individual studies
Reproduction |
Original study |
https://rdoodles.rbind.io/2019/06/reanalyzing-data-from-human-gut-microbiota-from-autism-spectrum-disorder-promote-behavioral-symptoms-in-mice/ and https://notstatschat.rbind.io/2019/06/16/analysing-the-mouse-autism-data/ |
Sharon, G. et al. Human Gut Microbiota from Autism Spectrum Disorder Promote Behavioral Symptoms in Mice. Cell 2019, 177 (6), 1600–1618.e17. |
Wei, X.; Nielsen, R. CCR5-∆32 Is Deleterious in the Homozygous State in Humans. Nat. Med. 2019 DOI: 10.1038/s41591-019-0459-6. (retracted) |
Authors/Date |
Title |
Field |
Type |
Why most published research findings are false |
Science |
Statistical reproducibility |
|
A Quick Guide to Organizing Computational Biology Projects |
Bioinformatics |
Best practices |
|
Ten Simple Rules for Reproducible Computational Research |
Computational science |
Best practices |
|
The Generalizability Crisis |
Psychology |
Statistical reproducibility |
|
Unreproducible Research is Reproducible |
Machine Learning |
Methodology |
|
Trustworthy data underpin reproducible research |
Physics |
Scientific philosophy |
|
Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity |
Science |
Statistical reproducibility |
|
A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility |
Science |
Best practices |
|
The importance of transparency and reproducibility in artificial intelligence research |
Artificial Intelligence |
Critique |
Authors/Date |
Title |
Tools |
Out-of-the-box Reproducibility: A Survey of Machine Learning Platforms |
MLflow, Polyaxon, StudioML, Kubeflow, CometML, Sagemaker, GCPML, AzureML, Floydhub, BEAT, Codalab, Kaggle |
- MOOCs
- Coursera Reproducible Research - Roger Peng et al JHU. Very popular course.
- edX Principles, Statistical and Computational Tools for Reproducible Science - John Quackenbush et al Harvard
- Online course content
- Tools for Reproducible Research - Karl Broman UW, includes resources page
- R for Reproducible Scientific Analysis - Software Carpentry workshop primer using Gapminder data
- R-DAVIS - Student-developed computer literacy and data course in R
- AMIA2019 - Pragmatic RR for Analysis, Dissemination and Publication
- R
- CRAN Task View - Reproducible Research - packages relevant to RCR in R
- liftr - persistent reproducible reporting through containerized R Markdown documents
- repo - provenance framework package
- Open With Binder for Chrome or Firefox - open the GitHub repository you are visiting using MyBinder.org
- DVC - DVC tracks machine learning models and data sets
- Reproducible Research with R and R Studio 2013
- Implementing Reproducible Research 2014 - Describes projects: Sumatra, Vistrails, CDE, SOLE, JUMBO, CML, knitr. Content available on OSF.
- The Practice of Reproducible Research 2017 - 31 first person case narratives and intro chapters
- Dynamic Documents with R and knitr 2015
- The Turing Way: A Handbook for Reproducible Data Science 2020
All these repositories assign Digital Object Identifiers (DOIs) to data
- DataCite - 12M+ DOIs registered for 46 allocators. Offers APIs and a metadata schema.
- Data Dryad - curated, metadata-centric, focused on articles associated with published artices, $120 submission fee (various waivers available)
- Figshare - 20 GB of free private space, unlimited public space, >2M articles, >5k projects
- OSF - Project-oriented system with access control and integration with popular tools. Unlimited storage for projects, but individual files are limited to 5 gigabytes (GB) each.
- Zenodo - Allows embargoed, restricted access, metadata support. 50GB limit.
- Jupyter Gallery - Gallery of interesting Jupyter notebooks
- Papers With Code - ML papers with code
- NARPS - Code related to Neuroimaging Analysis Replication and Prediction Study
- ReScience - Journal dedicated to insilico reproductions and tests of robustness, lives on Github.
- ReplicationWiki - Replication in the social sciences, particularly economics
- FAIRsharing - standards, databases, and policies
- BioPortal - 660 biomedical ontologies
- ResearchObject.org - RO specifications and publications
- BioCompute - BCO specs
- rOpenSci - Tools, conferences, and education
- Open Science Framework - Open source project management
- pyOpenSci - Promotes open and reproducible research through peer-review of scientific Python packages
- Replication Network - Furthering the practice of replication in economics. Econ replication database.
- Awesome Pipeline - So many pipelines frameworks
- Awesome Docker - Everything related to the Docker containerization system
- Awesome R - Section on RR tools
- Awesome Reproducible R - RRR tools
- Awesome Jupyter - Jupyter projects, libraries and resources
- Awesome Bioinformatics Benchmarks - Benchmarks are a related aspect of robustness testing
- Awesome Open Science - Resources, data, tools, and scholarship
- Awesome Public Datasets - A topic-centric list of HQ open datasets
- Awesome Semantic Web - Semantic web and linked data resources.
Contributions welcome! Read the contribution guidelines first.
To the extent possible under law, Jeremy Leipzig has waived all copyright and related or neighboring rights to this work.