/STAT406

STAT406 @ UBC - "Elements of Statistical Learning"

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

STAT406 - "Elements of Statistical Learning"

Public repository for STAT406 @ UBC - "Elements of Statistical Learning".

LICENSE

The notes in this repository are released under the "Creative Commons Attribution-ShareAlike 4.0 International" license. See the human-readable version here and the real thing here.

Course outline

The course outline is available here.

(UPDATED) Tentative weekly schedule (including Quizzes and Midterms)

The tentative week-by-week schedule is here.

PIAZZA

You can register in the course's PIAZZA page via Canvas.

WebWork

In order to complete the WebWork quizzes you need to register via Canvas: go to the course Canvas page, click on Assignments, then on WebWork Link, and finally click on Load WebWork Link on a new window. This is a necessary step (don't shoot the messenger!) but you only need to do this once.

Weekly reading and other resources

This is a list of strongly recommended pre-class reading. [JWHT13] and [HTF09] indicate two of the reference books listed below.

  • Week 1 (L1): Review of Linear Regression
    • Sections 2.1, 2.1.1, 2.1.2, 2.1.3, 2.2, 2.2.1 from [JWHT13]
    • Sections 2.4 and 2.6 from [HTF09].
  • Week 2 (L2/3): Goodness of Fit vs Prediction error, Cross Validation
    • Sections 5.1, 5.1.1, 5.1.2, 5.1.3 from [JWHT13]
    • Sections 7.1, 7.2, 7.3, 7.10 from [HTF09].
  • Week 3 (L4/5): Correlated predictors, Feature selection, AIC
    • Sections 6.1, 6.1.1, 6.1.2, 6.1.3, 6.2 and 6.2.1 from [JWHT13]
    • Sections 7.4, 7.5 from [HTF09].
  • Week 4 (L6/MT1): Ridge regression, LASSO, Elastic Net
    • Sections 6.2 (complete) from [JWHT13]
    • Sections 3.4, 3.8, 3.8.1, 3.8.2 from [HTF09]
  • Week 5 (L7/8): Elastic Net, Smoothers (Local regression, Splines)
    • Sections 7.1, 7.3, 7.4, 7.5, 7.6 from [JWHT13]
  • Week 6 (L9/10): Curse of dimensionality, Regression Trees
    • Sections 8.1, 8.1.1, 8.1.3, 8.1.4 from [JWHT13]
  • Week 7 (L11/MT2): Bagging
    • Sections 8.2, 8.2.1 from [JWHT13]
  • Week 8 (L12/13): Classification, LDA, LQA, Logistic Regression
    • Section 4.1, 4.2, 4.3, 4.4, 2.2.3 from [JWHT13]
  • Week 9 (L14/15): Trees, Ensembles, Bagging
    • Sections 8.1.2, 8.2.1 and 8.2.2 from [JWHT13]
  • Week 10 (L16/MT3): Random Forests
    • Sections 8.2.1 and 8.2.2 from [JWHT13]
  • Week 11 (L17/18): Boosting, Neural Networks?
    • Sections 8.2.3 from [JWHT13]
    • Sections 10.1 - 10.10 (except 10.7), 11.3 - 11.5, 11.7 from [HTF09]
  • Week 12 (L19/20): Unsupervised learning, K-means, model-based clustering
    • Sections 10.3 from [JWHT13]
    • Sections 13.2, 14.3 from [HTF09]
  • Week 13 (L21/L22): Hierarchical clustering, Principal Components, Multidimensional Scaling
    • Sections 10.2, 10.3 from [JWHT13]
    • Sections 8.5, 14.3, 14.5.1, 14.8, 14.9 from [HTF09]

Reference books

  • [JWHT13]: James, G., Witten, D., Hastie, T. and Tibshirani, R. An Introduction to Statistical Learning. 2013. Springer-Verlag New York

  • [HTF09]: Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning. 2009. Second Edition. Springer-Verlag New York

  • [MASS]: Venables, W.N. and Ripley, B.D. Modern Applied Statistics with S. 2002. Fourth edition, Springer, New York.

Useful tools

  • R: This is the software we will use in the course. I will assume that you are familiar with it (in particular, that you know how to write your own functions and loops). If needed, there are plenty of resources on line to learn R.
  • RStudio: The IDE (integrated development environment) of choice for R. Not necessary, but helpful.
  • Jupyter Notebooks. "The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text." You can use these to interactively run and play with the lecture notes and the code to reproduce all the examples I use in class. This is not necessary, but may be helpful. There are two options to run notebooks: locally on your own computer or use a remote server:
    1. Follow the instructions here to install Jupyter on your laptop. You will also need to follow these instructions to install the R kernel for Jupyter.
    2. Alternatively, you can run the notebooks on the syzygy server. There are Julia, Python 2, Python 3, and R kernels available (although we will only use the R one). Sign in with your UBC CWL. Once you are logged in, use this link to clone this repository (STAT406) (including all notebooks) directly onto your syzygy home directory. You may will need to do this regularly throughout the Term.