/COMPSCI-589

Open source applied machine learning course

Primary LanguageTeXGNU General Public License v3.0GPL-3.0

COMPSCI 589: Open Source ML Course

Introduction

COMPSCI 589 is an open source applied machine learning course designed for senior undergraduate students and junior (masters-level) graduate students. The course materials have been developed by Prof. Benjamin M. Marlin at the College of Information and Computer Sciences, University of Massachusetts Amherst since fall 2014.

How To Use These Materials

The course slides were created in Latex using the Beamer package. Pre-compiled PDF slides are available in the slides directory. Pre-compiled PDF handouts (without animations) are available in the handouts directory. The majority of the lectures also have accompanying Jupyter notebook demos. The demos are located in the demos/code directory.

The Latex source for the slides is available in the src directory. The title slide for each lecture can by customized with your course number, your name, and your affiliation by editing the src/config.tex file and recompiling the slides. To recompile the slides, you will need pdflatex installed with the Beamer package. Slides and handouts can be recompiled individually, or using the supplied compile_all_slides.sh bash script.

The demos require Python 2.7, Jupyter notebook, and a current version of scikit-learn. Some demos use additional packages including Theano and wxPython.

Course Topics and Readings

The course introduces core machine learning models and algorithms for classification, regression, clustering, and dimensionality reduction. On the theory side, the course focuses on understanding models and the relationships between them. On the applied side, the course focuses on effectively using machine learning methods to solve real-world problems with an emphasis on model selection, regularization, design of experiments, and presentation and interpretation of results. The course also explores the use of machine learning methods across different computing contexts including desktop and cloud computing. The course focuses on Python, Scikit-Learn, and Apache Spark as toolkits.

The readings are taken from An Introduction to Statistical Learning [ISL], and The Elements of Statistical Learning, Second Edition [ESL], both of which are freely available.

Course Contents

Unit 1: Classification

  • Lecture 1: Course Overview - Supervised and Unsupervised Learning

    Materials: Slides | Handouts | latex

    Reading: ISL Section 1 (p.1-9), Section 2.1.4 (p27-29)

  • Lecture 2: KNN and Decision Trees

    Materials: Slides | Handouts | latex

    Reading: ESL Section 2.3.2 (p.14-16), ISL: Section 8 (p. 303, 311-314), ESL Section 2.5 (p.22-23)

  • Lecture 3: Naïve Bayes, LDA, and Logistic Regression

    Materials: Slides | Handouts | latex

    Reading: ESL Section 4 (p. 101-102, 106-110, 119-120, 127-132)

  • Lecture 4: Overfitting, Regularization and Crossvalidation

    Materials: Slides | Handouts | latex

    ISL Section 2.2.3 (p. 37), Section 5 (176-183, 184-186)

  • Lecture 5: Support Vector Machines, Basis Expansion, and Kernels

    Materials: Slides | Handouts | latex

    Reading: ISL Section 9.5 (p.356-359)

  • Lecture 6: Neural Networks and Deep Learning

    Materials: Slides | Handouts | latex

    Reading: ESL Section 11.3 (p.392-395, 397-409)

  • Lecture 7: Ensembles and Classification

    Materials: Slides | Handouts | latex

    Reading: ISL Section 8.2 (p.316-324)

Unit 2: Regression

  • Lecture 8: Linear Regression, Ridge and the Lasso

    Materials: Slides | Handouts | latex

    Reading: ISL Section 3.1 (p.61-63), Section 3.2 (p.71-75), Section 6.2 (p.214-224), Section 3.3.2 (p.86-92)

  • Lecture 9: KNN, Regression Trees, and Feature Selection

    Materials: Slides | Handouts | latex

    Reading: ISL Section 3.5 (p.104-109), Section 8.1.1 (p.304-311), Section 6.1 (205-210)

  • Lecture 10: Support Vector and Neural Network Regression

    Materials: Slides | Handouts | latex

    Reading: ESL Section 11.3 (392-401), ESL Section 12.3.6 (p.434-438)

  • Lecture 11: KOLS and Gaussian Process Regression

    Materials: Slides | Handouts | latex

    Reading: Gaussian Processes in Machine Learning

Unit 3: Large-Scale Learning

Unit 4: Clustering

  • Lecture 15: Hierarchical Clustering

    Materials: Slides | Handouts | latex

    Reading: ISL Section 10.3.2 (p.390-401)

  • Lecture 16: K-Means Clustering

    Materials: Slides | Handouts | latex

    Reading: ISL Section 10.3.1 (p.386-390), ESL Section 6.8 (p.214-216), Section 8.5 (p.272-276)

  • Lecture 17: Mixture Models

    Materials: Slides | Handouts | latex

    Reading: ISL Section 10.3.1 (p.386-390), ESL Section 6.8 (p.214-216), Section 8.5 (p.272-276)

Unit 5: Dimensionality Reduction

  • Lecture 18: Linear Dimensionality Reduction and SVD

    Materials: Slides | Handouts | latex

    Reading: ESL Section 14.15.1 (p.534-536)

  • Lecture 19: Principal Component Analysis

    Materials: Slides | Handouts | latex

    Reading: ISL Section 10.3 (p.374-385)

  • Lecture 20: Sparse Coding, Non-negative Matrix Factorization, and Independent Component Analysis

    Materials: Slides | Handouts | latex

    Reading: ESL Section 14.6 (p.553-557), Section 14.7 (p.557-570),

    Reading: Sparse Coding

  • Lecture 21: Kernel PCA and Spectral Clustering

    Materials: Slides | Handouts | latex

    Reading: ESL Section 14.15.3 (p.544-547), ESL Section 14.15.4 (p.547-550),

  • Lecture 22: Multidimensional Scaling and Isomap

    Materials: Slides | Handouts | latex

    Reading: ESL Section 14.8-9 (p.570-576)

List of Demos

  • Lecture01: Introduction to Python
  • Lecture02: KNN and Decision Trees
  • Lecture03: Naive Bayes, LDA and Logistic Regression
  • Lecture04: Model Complexity and Overfitting
  • Lecture05: SVMs, Basis Expansions and Kernels
  • Lecture06: Neural Network Classification (uses Theano)
  • Lecture11: Gaussian Processes (uses wxPython)
  • Lecture15: Hierarchical Clustering
  • Lecture16: KMeans Clustering
  • Lecture17: Mixture Models
  • Lecture18-20: Linear Dimensionality Reduction

Legal

Copyright 2016 Benjamin M. Marlin. These materials are provided under the GNU GENERAL PUBLIC LICENSE Version 3 (GPL 3). As permitted by GPL 3 Section 7(b), all attributions present in this work must be preserved in all copies and derived works.

Support

The development of these materials is supported by the National Science Foundation through award # IIS-1350522.