Instructor: Niao He
Nearly every problem in machine learning and high-dimensional statistics can be formulated in terms of optimization of some function, possibly under some constraints. In the wake of Big Data era, there is an increasingly strong need to design efficient and scalable algorithms to address optimization problems in large scale - including problems exhibiting inherently high dimensions and problems involving truly massive data sets. The key aim of this course is to expose students to a wide range of modern algorithmic developments in optimization (mainly in convex optimization) and bring them near the frontier of research in large-scale optimization and machine learning.
Courtesy warning: The course will be theory-oriented and emphasize deep understanding of structures of optimization problems and computation complexity of numerical algorithms. The course will not cover any software or tools used for big data analytics and is not an application course.
Students are expected to have strong knowledge of linear algebra, real analysis, and probability theory. Some prior exposure to optimization at a graduate level is preferred.
No book is required, but you are highly recommended to read the following:
- Ben-Tal & Nemirovski. Lectures on Modern Convex Optimization, SIAM, 2011.
- Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, Kluwer-Academic, 2003.
- Sra, Nowozin, Wright (eds). Optimization for Machine Learning. MIT Press, 2011.
- Bubeck. Convex Optimization: Algorithms and Complexity. In Foundations and Trends in Machine Learning, 2015.
Lecture | Topic | Resources |
---|---|---|
1 | Introduction to course | [Scribe] |
2 | Convex Sets | [Scribe] |
3 | Convex Functions | [Scribe] |
4 | Convex Functions | [Scribe] |
5 | Convex Optimization | [Scribe] |
6 | Conic Programming | [Scribe] |
Lecture | Topic | Resources |
---|---|---|
7 | Introduction to Optimization Algorithms | [Scribe] |
8 | Gradient Descent | [Scribe] |
9 | Gradient Descent and Its Acceleration | [Scribe] |
10 | Projected Gradient Descent | [Scribe] |
11 | Conditional Gradient (a.k.a. Frank Wolfe Algorithm) | [Scribe] |
12 | Coordinate Descent Algorithms | [Scribe] |
13 | Case Study on Logistic Regression | [Scribe] |
Lecture | Topic | Resources |
---|---|---|
14 | Subgradient Method | [Scribe] |
15 | From Subgradient Descent to Mirror Descent | [Scribe] |
16 | Smoothing Techniques I | [Scribe] |
17 | Smoothing Techniques II & Proximal Point Algorithm | [Scribe] |
18 | Mirror Prox for Saddle Point Problems | [Scribe] |
19 | (Accelerated) Proximal Gradient Method | [Scribe] |
20 | Splitting Algorithms | [Scribe] |
Lecture | Topic | Resources |
---|---|---|
21 | Sample Average Approximation | [Scribe] |
22 | Stochastic Gradient Descent / Mirror Descent | [Scribe] |
23 | Incremental Gradient Methods for Finite Sum Problems | [Scribe] |
24 | Summary and Outlook | [Slides] |