/reproducible-research-gesis-2023

Materials for the 2023 GESIS Training workshop "Workflows for Reproducible Research with R & Git"

Primary LanguageHTML

Workflows for Reproducible Research with R & Git

Materials for the 2023 GESIS workshop "Workflows for Reproducible Research with R & Git"

by Johannes Breuer, Bernd Weiß, and Arnim Bleier

Please link to the workshop GitHub repository


Workshop description

The workshop focuses on reproducible research in the quantitative social and behavioral sciences. In the context of this workshop, reproducibility means that other researchers can fully understand and rerun your data preparation and statistical analyses. However, the workflows and tools covered in this workshop will also help in facilitating your own work as they allow you to automate and track analysis and reporting tasks. In addition to a conceptual introduction to the methods and key terms around reproducible research, this workshop focuses on procedures for maximizing the reproducibility of data analyses using R. After discussing essential definitions and dimensions of reproducibility, we will cover some computer literacy and project organization basics that are helpful for conducting reproducible research (e.g., folder structures, naming schemes, or command-line interfaces). After that, we will focus on version control, dependency management, and computational reproducibility. The tools we will use for that include Git and GitHub, R packages for dependency management as well as Binder, a tool to package and share reproducible and interactive analysis environments.

Target group

The workshop is targeted at participants who have (at least some) experience with R and want to learn (more) about workflows and tools for making the results of their research reproducible.

Learning objectives

By the end of the course participants should:

  • have gained important insights into key concepts of reproducible research and recommended best practices
  • be able to work with frameworks and tools that can be used for maximizing reproducibility, such as Git, R packages for dependency management, or Binder
  • be able to publish reproducible computational analysis pipelines with R

Prerequisites

Participants should have some basic knowledge of R and RStudio (e.g., installing and loading packages, importing different data types, basic data wrangling, and analyses). To facilitate applying the methods covered in the course to their work, we recommend that participants ensure to install all necessary software on their computers before the start of the course.

Timetable & content

Day 1

Time Topic Slides Exercises Solutions
09:30 - 10:45 Introduction HTML, PDF - -
10:45 - 11:00 Coffee Break - - -
11:00 - 12:00 Computer literacy HTML, PDF see slides see slides
12:00 - 13:00 Lunch Break - - -
13:00 - 15:00 Git & GitHub - Part I HTML, PDF
(contain also Part II)
see slides see slides
15:00 - 15:15 Coffee Break - - -
15:15 - 16:30 Git & GitHub - Part II HTML, PDF HTML HTML
16:30 - 17:00 Q&A - - -

Day 2

Time Topic Slides Exercises Solutions
09:00 - 09:30 Recap Day 1 HTML, PDF - -
09:30 - 11:00 Dependency management HTML, PDF HTML HTML
11:00 - 11:15 Coffee Break - - -
11:15 - 12:00 Binder & Notebooks PDF see slides -
12:00 - 13:00 Lunch Break - - -
13:00 - 14:30 Build your own Binder PDF see slides Project
14:30 - 14:45 Coffee Break - - -
14:45 - 16:00 Saving computational environments PDF Project -
16:00 - 17:00 Recap & Outlook HTML, PDF - -

Acknowledgments

The R Markdown parts of this workshop were created using the R packages xaringan, unilur, and woRkshoptools. The materials are based on an earlier version of this workshop and a similar course by Frederik Aust and Johannes Breuer.