Optimization methods

This is a package containing implementations of different loss functions and optimization algorithms. The main goal of this package is to have a unified and easy-to-use comparison of iteration complexities of the algorithms, so the time comparison of methods is approximate. If you are interested in finding the best implementation of a solver for your problem, you may find the BenchOpt package more useful.

Structure

Currently, the methods are structured as follows: first-order, quasi-Newton, second-order, and stochastic first-order methods. A number of universal line search procedures is implemented.

First-order

Gradient-based algorithms:
Gradient Descent (GD), Polyak's Heavy-ball, Incremental Gradient (IG), Nesterov's Acceleration, Nesterov's Acceleration with a special line search, Nesterov's Acceleration with restarts (RestNest), Optimized Gradient Method (OGM).
Adaptive: AdaGrad, Adaptive GD (AdGD), Accelerated AdGD (AdgdAccel), Polyak.

Quasi-Newton

BFGS, DFP, L-BFGS, Shor's R, SR1.

Second-order

Algorithms that use second-order information (second derivatives) or their approximations.
Newton, Cubic Newton, and Regularized (Global) Newton.

Stochastic first-order

SGD, Root-SGD, Stochastic Variance Reduced Gradient (SVRG), Random Reshuffling (RR).

Notebooks

Deterministic first-order methods: GD, acceleration, adaptive algorithms.
Second-order methods and quasi-Newton algorithms: Newton, Levenberg-Marquardt, BFGS, SR1, DFP.
Example of running the methods on log-sum-exp problem: a hard problem where quasi-Newton methods may either outperform all first-order method or fail due to high nonsmoothness of the problem. One can change the problem difficulty by adjusting the smoothing parameters of the objective.
To be added: benchmarking wall-clock time of some numpy and scipy operations to show how losses should be implemented.