/mlcourse

Machine learning course materials.

Primary LanguageJupyter Notebook

Notable Changes Since 2018

  • Added a note on retraining SVMs with just the support vectors
  • Added a note on a moment-matching interpretation of fitting logistic regression and more general softmax-style linear conditional probability models.

Notable Changes from 2017FOML to 2018

  • Elaborated on the case against sparsity in the lecture on elastic net, to complement the reasons for sparsity on the slide Lasso Gives Feature Sparsity: So What?.
  • Added a note on conditional expectations, since many students find the notation confusing.
  • Added a note on the correlated features theorem for elastic net, which was basically a translation of Zou and Hastie's 2005 paper "Regularization and variable selection via the elastic net." into the notation of our class, dropping an unnecessary centering condition, and using a more standard definition of correlation.
  • Changes to EM Algorithm presentation: Added several diagrams (slides 10-14) to give the general idea of a variational method, and made explicit that the marginal log-likelihood is exactly the pointwise supremum over the variational lower bounds (slides 31 and 32)).
  • Treatment of the representer theorem is now well before any mention of kernels, and is described as an interesting consequence of basic linear algebra: "Look how the solution always lies in the subspace spanned by the data. That's interesting (and obvious with enough practice). We can now constrain our optimization problem to this subspace..."
  • The kernel methods lecture was rewritten to significantly reduce references to the feature map. When we're just talking about kernelization, it seems like unneeded extra notation.
  • Replaced the 1-hour crash course in Lagrangian duality with a 10-minute summary of Lagrangian duality, which I actually never presented and left as optional reading.
  • Added a brief note on Thompson sampling for Bernoulli Bandits as a fun application for our unit on Bayesian statistics.
  • Significant improvement of the programming problem for lasso regression in Homework #2.
  • New written and programming problems on logistic regression in Homework #5 (showing the equivalence of the ERM and the conditional probability model formulations, as well as implementing regularized logistic regression).
  • New homework on backpropagation Homework #7 (with Philipp Meerkamp and Pierre Garapon).

Notable Changes from 2017 to 2017FOML

Notable Changes from 2016 to 2017

  • New lecture on geometric approach to SVMs (Brett)
  • New lecture on principal component analysis (Brett)
  • Added slide on k-means++ (Brett)
  • Added slides on explicit feature vector for 1-dim RBF kernel
  • Created notebook to regenerate the buggy lasso/elastic net plots from Hastie's book (Vlad)
  • L2 constraint for linear models gives Lipschitz continuity of prediction function (Thanks to Brian Dalessandro for pointing this out to me).
  • Expanded discussion of L1/L2/ElasticNet with correlated random variables (Thanks Brett for the figures)

Notable Changes from 2015 to 2016

Possible Future Topics

Basic Techniques

  • Gaussian processes
  • MCMC (or at least Gibbs sampling)
  • Importance sampling
  • Density ratio estimation (for covariate shift, anomaly detection, conditional probability modeling)
  • Local methods (knn, locally weighted regression, etc.)

Applications

  • Collaborative filtering / matrix factorization (building on this lecture on matrix factorization and Brett's lecture on PCA)
  • Learning to rank and associated concepts
  • Bandits / learning from logged data?
  • Generalized additive models for interpretable nonlinear fits (smoothing way, basis function way, and gradient boosting way)
  • Automated hyperparameter search (with GPs, random, hyperband,...)
  • Active learning
  • Domain shift / covariate shift adaptation
  • Reinforcement learning (minimal path to REINFORCE)

Latent Variable Models

  • PPCA / Factor Analysis and non-Gaussian generalizations
    • Personality types as example of factor analysis if we can get data?
  • Variational Autoencoders
  • Latent Dirichlet Allocation / topic models
  • Generative models for images and text (where we care about the human-perceived quality of what's generated rather than the likelihood given to test examples) (GANs and friends)

Bayesian Models

  • Relevance vector machines
  • BART
  • Gaussian process regression and conditional probability models

Technical Points

Other

  • Class imbalance
  • Black box feature importance measures (building on Ben's 2018 lecture)
  • Quantile regression and conditional prediction intervals (perhaps integrated into homework on loss functions);
  • More depth on basic neural networks: weight initialization, vanishing / exploding gradient, possibly batch normalization
  • Finish up 'structured prediction' with beam search / Viterbi
    • give probabilistic analogue with MEMM's/CRF's
  • Generative vs discriminative (Jordan & Ng's naive bayes vs logistic regression, plus new experiments including regularization)
  • Something about causality?
  • DART
  • LightGBM and CatBoost efficient handling of categorical features (i.e. handling categorical features in regression trees )

Citation Information

Creative Commons License
Machine Learning Course Materials by Various Authors is licensed under a Creative Commons Attribution 4.0 International License. The author of each document in this repository is considered the license holder for that document.