This is a curriculum of open-source data science exercises, intended to take a student from zero coding experience to basic data science literacy. These exercises are heavily inspired by the (discontinued) Data Challenge Lab at Stanford University and rely on the Tidyverse.
- (Setup) Complete this exercise to install RStudio.
- (Setup) Download and unzip this archive to obtain the curriculum materials.
- (Setup) Open the folder you unzipped as a project in RStudio (
File > Open Project...
). - (Learn) Work through the exercise files in the
exercises_sequenced/
folder at your own pace to learn data science skills. - (Learn) Use the challenges in the
challenges/
folder to put your new skills to use.
Suggested order: The exercises filenames start with a numerical dXY
prefix to denote
their suggested day-order. This is provided to
interleave
topics and provide about an hour of work per day. I recommend working 5 days a
week on the exercises and taking weekends off!
- Curriculum contains the desired learning outcomes of this material
- Exercises contains the exercises, which provide a first introduction to using the Tidyverse to do Data Science
- Challenges contains more open-ended data challenges, which will test and build upon your skills from the exercises
- Content visualization script to help sequence course content and visualize topics
- Sequencing script to help assign exercise and challenge due dates
Data Science is a powerful toolkit to extract usable insights from data. In this class, you will learn tools and gain understanding. You will use software tools to liberate data from published images and tables, wrangle messy datasets into machine learning (ML)-ready form, fit and interpret ML models, and visualize to extract meaning. You will also speak the language of uncertainty---statistics---to avoid getting fooled by models. You will criticize published findings and ask what is, and what is not, in the data. Assignments will include regular practice exercises, progressively puzzling real-data challenges, and a final project of your choice where you obtain, wrangle, and understand a dataset.
I welcome suggestions and contributions! If you want to contribute, please see Contributing.