Introduction to creating Reproduction Packages with R, Git, and GitHub.

About this tutorial

This work was originally created by Mike Croucher from RSE-Sheffield under a Creative Commons Attribution Share Alike 4.0 International. It was subsequently adapted by Malika Ihle from Reproducible Research Oxford and Anne-Kathrin Kleine from LMU Munich.

Prerequisites

  • You should have an understanding of how to use R, Git, and GitHub

Helpful resources

Tutorial Overview

In this self-paced tutorial, you will learn how to create Reproduction Packages. We will cover the following topics:

1. Introduction to reproducbility and the role of Reproduction Packages

  • What is data reproducibility?
  • The role of Reproduction Packages for reproducibility of results in the Social Sciences
  • The core elements of Reproduction Packages

2. Software and tools for creating Reproduction Packages

  • R and relevant R packages
  • Git and GitHub
  • Remote websites for data and code publication/ hosting

3. Data preparation

  • The principles of data management
  • De-identifying confidential or sensitive information

4. Data and code documentation

  • Annotating data and creating a codebook
  • Annotating data cleaning scripts and data analysis code

5. Data and code publication

6. Data and code licensing

7. Time and resource planning

  • How much time and funding should I allocate to creating Reproduction Packages?
  • Example time plan

8. A Reproduction Package - step by step

  • Step-by-step creating a Reproduction Package

Step-by-step tutorial

The material is self-paced and includes a worked-example at the end. It is necessary that you work through the sections in order.