ATTENTION: starting from the academic year 2021, this page is no longer maintained. For the current slides and code, please see the Microsoft Teams channel.
- On GitHub: example Notebooks, slides, extra material, exercises (in slides), data sets
- On Canvas: student manual, assignments, link to webinar recordings, overview of content per week
Example code can be found in the Examples folder. During class, exercises will be shown on the slides. The data sets for these exercises can be found in the corresponding example folder. These are exercises you can make during the lesson to test your knowledge. You don't need to submit these.
- Short cheatsheet
- Extensive Python cheatsheet with examples
- A more minimal cheatsheet
- Datacamp Python basics
- Datacamp Python for data science
- Markdown cheatsheet (for text in Notebooks)
Using tools from data science and machine learning would not make a lot of sense without some understanding of mathematics and statistics. However, the focus of the course is on the application of data science, rather than the mathematical foundation. If I use formulas, I will not focus on the technical aspects, but explain what they do conceptually. If you need to catch up on math, you can use these links to the Khan Academy:
- Basic algebra
- Equations and variables
- Squares and roots
- The coordinate plane and linear equations
- Exponents and logarithms
- Basic probability theory
- Week 1: video on mean, median and mode
- Week 1: video on distributions
- Week 1: video on plotting distributions in Seaborn
- Week 2: blog with overview with different types of bar charts and when to use them
- Week 2: blog on count plots in Seaborn
- Week 2: blog on scatterplot matrix in Seaborn
- Week 2: blog on correlation in Pandas and Seaborn
- Week 3: blog on linear regression in sklearn
- Week 3: another blog on linear regression, including residual plot
- Week 4: blog with introduction to machine learning
- Week 4: blog on k-NN
- Week 5: blog on decision tree (includes some math, but you can ignore this)
- Week 5: blog on Random Forest (begin here)
- Week 5: another blog on Random Forest, including parameter optimization
- Week 6: blog on text classification using bag-of-words-model and Naive Bayes