/intro-tidyverse-dev

Primary LanguageHTMLCreative Commons Attribution Share Alike 4.0 InternationalCC-BY-SA-4.0

Introduction to Data Science with R and Tidyverse

This repository contains all materials for the course Introduction to Data Science with R and Tidyverse, offered for GRADE Brain and other GRADE Centers at Goethe University in January 2023. Additionally, it serves the course website for students, which you can access here.

Course Objective

Most academic fields require proficiency in at least one data-centered analysis tool. For many, the R programming language has become the tool of choice.

However, the first steps in coding can be intimidating and discouraging --- primarily if you have never worked with a programming language like R before. This course provides a results-oriented, applied, and hands-on introduction to the most critical parts of a Data Science project in R. We will introduce the libraries and frameworks necessary for your analysis and focus on teaching you the implementation and application of those tools with small examples that you can work on yourself.

Our goal is to show you the scope of possibilities within R and leave you with the impression that you can confidently implement your empirical projects in R. We will focus on the Tidyverse ecosystem, a consistent and intuitive framework for building your data analysis from start to finish. After completing this course, you will know how to apply the essential Tidyverse tools for everyday Data Science tasks in R --- primarily data wrangling, data visualization, and communicating results.

Course Description

This course aims at beginners who are completely new to R as a programming language and/or want to learn about the Tidyverse ecosystem. We structured the course in the following way:

Introduction to the Tidyverse

  • Reading data into tibbles with readr and a short primer on data types.
  • Plotting with ggplot2: aesthetics, geoms, and the grammar of graphics.
  • Data wrangling with dplyr: mutate(), select(), filter(), group_by(), summarize(), …_join(), and pipe-operator %>%.
  • Communicating your analyses with RMarkdown in a reproducible way.

Short primer on modeling with R

  • Univariate and multivariate linear regression with lm().
  • Visualizing regressions with ggplot2.

Next steps on your journey with R

This course will not cover more profound statistical or theoretical concepts, as the focus will be applied coding.

Methods

The course will alternate between short introductions to a concept or method and small do-it-yourself coding exercises. In between the three sessions, you are encouraged to work on provided exercises that further deepen your understanding.

Conditions

  • No prior coding experience is needed. This course is a beginner-friendly course. You are also more than welcome to participate if you have experience in R but want to learn more about the Tidyverse.
  • An Posit Cloud (formerly RStudio Cloud) account. Since we do not want to waste precious course time on the technical setup, we will use the Posit Cloud as a simple and readily available development environment. We will send out detailed instructions and an invitation link in advance.

Trainers

In the last three years, your trainers have developed and taught TechAcademy's Data Science with R program at Goethe University. They use data science methods and R daily in their academic and non-academic jobs.

  • Lukas Jürgensmeier, M.Sc., PhD Student in Quantitative Marketing and Member of the Executive Board at TechAcademy e.V.
  • Lara Zaremba, M.Sc., Trainee Data Science at the European Central Bank (ECB) and former R Teacher and Course Designer at TechAcademy e.V.
  • Karlo Lukic, M.Sc., PhD Student in Quantitative Marketing, former R Teacher and Course Designer at TechAcademy e.V.