
This repository will contain all shared materials (slides, documents, assignments, ...)


Programming Data Science

This repository contains all student materials (assignments, data, ...) for the programming data science course in summer 2019.

If you participate in the course, please send an email to Jörn Grahl and ask for access.

I grant students read-only access as a collaborator. Please fetch updates regularly.

Here is the tentative schedule for summer 2019:


# Date Topic
1 04.04.2019 Fundamentals Organization, dates, groups, project leaders. Tools and accounts. Source control. Reproducibility. Coding and reporting. Method chaining. Folders.
2 11.04.2019 What can computers learn from data? Mapping questions to model classes, statements, and tests. The ladder of causality.
3 25.04.2019 Visualizations (plots) Grammar of graphics
4 02.05.2019 Coding 1 Data wrangling 1/2 & method chaining
5 09.05.2019 Coding 2 Data wrangline 2/2
6 16.05.2019 Flexible tables for descriptive statistics (and everything else), regression results
7 23.05.2019 Comparisons: basic inference, tests, p-values, multiple comparisons
8 06.06.2019 Choosing the best model Overfitting, bias-variance tradeoff, resampling schemes, choosing the best ML model, cross-validation, paired T-tests, training and evaluation errors
9 27.06.2019 In-depth linear regression
10 04.07.2019 In-depth decision trees
11 11.07.2019 Buffer