Reproducible Data Science

Project repository for the course Reproducible Data Science (BST270) at Harvard University.

This course starts from a fundamental premise: the scientific method requires that work can be verified or falsified by others. For this reason, analysis of the data should be reproducible (i.e. given input data and computational tools, one should be able to rerun the same methods to obtain the same result) and preferrably replicable (i.e. a study should be duplicable using the same procedures and new data).

First, two links:

I wrote a non-comprehensive summary of the lectures. Check it out here.
This is the official repository of the course.

Structure of this repository

As part of the course, we attempted to reproduce the results from two different works:

This paper about the relationship between optimism and lipids. Our reproducibility analysis can be found in the midus folder. Check out the README for more information.
COVID-19 tables and visualizations from the New York Times. Our reproducibility analysis can be found in the covid-19 folder. Check out the README for more information.

Appreciation

Special thanks to Viola Fanfani for inviting me to join the class and for the interesting discussions.

soelmicheletti/reproducibile-data-science

Reproducible Data Science

Structure of this repository

Appreciation