Autumn 2022 Lecturers: Mark van der Wilk, Yingzhen Li
Lecture materials for the Imperial College London course "Mathematics for Machine Learning". Some material is based on an earlier version of the course by Marc Deisenroth. Lecture recordings are available on Panopto.
We welcome any suggestions and fixes from students. For those who make contributions, there will be cake.
If you want to make a contribution, please fork the repo, and make a pull request.
Below is a syllabus of the course, together with a rough plan for the term. This is subject to change.
- Pre-course exercises
- Basic probability, events, mutual exclusivity, independence, RVs
- Probability densities, multivariate ones, multivariate integrals
- Simple maximum likelihood
- Statistical terminology (estimator, statistic, ...)
- Lecture 1: A (Hopefully (Reasonably)) Familiar Collection of Maths (MvdW)
- Introduction (MvdW)
- What is this course about?
- Who is it for?
- Setting the scene for Probability
- Understanding the world using probability, models (shaking desk)
- Notation of vector probability density (vectors as grouping of variables)
- Maximum Likelihood (Recap from P&S)
- Linear Regression from loss func and as MaxLik
- Multivariate differentiation
- Differentiation, by scalars and vectors
- Multivariate calculus (revision)
- Introduction (MvdW)
- Lecture 2: Differentiation & Autodiff (MvdW)
- Differentiation of vectors, general array differentiation
- Index notation for multivariate calculus
- Lecture 3: Automatic differentiation (MvdW)
- Computational Graph
- Forward-mode autodiff
- Backward-mode autodiff
- Computational complexity guarantees (see this)
- Lecture 4: Gradient Descent (YL)
- Lecture 5: Gradient descent applications: Linear & Logistic Regression (YL)
- Lecture 6 & 7: Multivariate Probability (YL)
- Lecture 8 & 9: Validation & Cross-validation (MvdW)
- Overfitting
- Estimating on the training set: Danger, underestimates error!
- Unbiased estimate (recap from P&S)
- High-probability bounds on generalisation error
- Lecture 10 and 11: Bayesian inference (MvdWL)
- Principle of Bayesian inference
- Gaussian conditioning from completing the square
- Gaussian conditioning from a big joint distribution
- Bayesian Linear Regression
- Exercises: Simple Bayesian inference (coins), derive BLR.
- Lecture 12: Bias-variance trade-off (YL)
- Lecture 13 & 14: Dimensionality Reduction & PCA (YL 2L)