Instructor
Yuchao Jiang, Assistant Professor, Department of Biostatistics, UNC Chapel Hill
Office: 4115D McGavran-Greenberg Hall
Phone: 919-843-3656
Email: yuchaoj@email.unc.edu (contact via slack is preferred)
Course Information
-
Description: This course is an introductory course to machine learning and statistical learning and is required for MPH students with Data Science concentration. While some technical details will be covered, emphasis will be made on understanding the models, intuitions, and strengths and weaknesses of the various approaches. The goal is to equip students with knowledge of existing tools for data analysis and to get students prepared for more advanced courses in machine learning. Programming language will be R – students will learn how to use the free and powerful software R in connection with each of the methods exposed in the class. For deep learning, Keras/TensorFlow in Python will be introduced if time permits.
-
Class Time & Location: Tuesdays and Thursdays, 9:30am – 10:45am, 228 Rosenau Hall.
-
Office Hours: Thursdays, 10:45am - 11:45am (instructor; Calendly) & Mondays, 1:30pm - 2:30pm (TA; Calendly).
-
Teaching Assistant: Jianqiao Wang (jianqiao@live.unc.edu).
-
Graders: Sara Qi (xiaoyuqi@email.unc.edu) Xinjie Qian (qianqxj@live.unc.edu).
Lecture Slides and R Markdowns
- Lecture 1: Introduction (slides)
- Lecture 2: Curse of Dimensionality & Assessing Model Accuracy (slides)
- Lecture 3: Bias-Variance Tradeoff & K-Nearest Neighbor (slides, html)
- Lecture 4: Linear Regression (slides, html)
- Lecture 5: Logistic Regression (slides, html)
- Lecture 6: Linear/Quadratic Discriminant Analysis (slides, html)
- Lecture 7: Naive Bayes & ROC Curve (slides, html)
- Lecture 8: Nonlinearity (slides, html)
- Lecture 9: Cross Validation (slides, html)
- Lecture 10: Bootstrap (slides, html)
- Lecture 11: Subset/Stepwise Selection, AIC, BIC, Adjusted R-squared (slides, html)
- Lecture 12: Shrinkage Methods, Ridge and Lasso Regression (slides, html, data)
- Lecture 13: Principal Component Regression & Partial Least Squares (slides, html)
- Lecture 14: Midterm Review (slides)
- Lecture 15: Decision Trees (slides, html)
- Lecture 16: Bagging, Boosting & Random Forest (slides, html)
- Lecture 17: Project Guidelines (slides)
- Lecture 18: Support Vector Classifier & Kernel Methods (slides, html)
- Lecture 19: Support Vector Machine (slides, html)
- Lecture 20: Unsupervised Learning & Dimension Reduction (slides, html)
- Lecture 21: K-Means & Hierarchical Clustering (slides, html)
- Lecture 22: Gaussian Mixture Clustering & EM Algorithm (slides, html)
- Lecture 23: Gradient Descent & Forward/Backward Propagation (slides)
- Lecture 24: Deep Neural Network (slides)
Assignments
- Assignment 1: theory (pdf, LaTex); coding (rmd, data); solutions (pdf, html).
- Assignment 2: theory (pdf, LaTex); coding (rmd); solutions (pdf, html).
- Assignment 3: theory (pdf,LaTex); coding (rmd); solutions (pdf, html).
Exams
- Midterm: March 1st 9:30am, Rosenau 228.
- Final: May 3rd 9:00am, Rosenau 228.
Potential projects
- Project 1: Predict the effect of genetic variants to enable personalized medicine
- Project 2: West Nile virus prediction
- Project 3: Predict Parkinson’s disease progression with smartphone data
- Project 4: Improve the algorithm that classifies drugs based on their biological activity
Other Resources
-
Machine Learning Textbooks:
- Bishop, Pattern Recognition and Machine Learning, Springer (more advanced)
- Efron and Hastie, Computer Age Statistical Inference, Cambridge University Press (recommended)
- Goodfellow, Bengio, and Courville, Deep Learning, MIT Press (more advanced)
- Hastie, Tibshirani, Friedman, The Elements of Statistical Learning, Springer (more advanced)
- James, Witten, Hastie, and Tibshirani, An Introduction to Statistical Learning, Springer (required)
- Murphy, Machine Learning: A Probabilistic Perspective, MIT Press (more advanced)
-
Other Machine Learning Resources:
- Introduction to Data Science: Data Analysis and Prediction Algorithms in R by Rafael Irizarry
- Machine Learning Lecture Notes by Andrew Ng
- R for Data Science by Garrett Grolemund and Hadley Wickham
- Statistical Learning MOOC by Trevor Hastie and Rob Tibshirani