- OPTAMI: OPTimization for Applied Mathematics and Informatics
This package is dedicated to high-order optimization methods. All the methods can be used similarly to standard PyTorch optimizers.
Altough the development of this library was motivated primarily by the need in implementations of high-order optimization methods, we call contributors to commit methods of any order, and also already provide some of first-order methods in this library. Below we list all the currently supported algorithms divided into categories by their order, with the links on their source papers and/or their wiki pages.
-
Similar Triangles method
Gasnikov, A., Nesterov, Y. Universal Method for Stochastic Composite Optimization Problems. Comput. Math. and Math. Phys. 58, 48–64 (2018). https://doi.org/10.1134/S0965542518010050
-
Damped Newton method
-
Affine-Invariant Cubic Newton method
-
Globally regularized Newton method
Mishchenko, K. Regularized Newton method with global O(1/k^2) convergence. arXiv preprint arXiv:2112.02089 (2021). https://arxiv.org/abs/2112.02089
-
Cubic regularized Newton method
Nesterov, Y., Polyak, B. Cubic regularization of Newton method and its global performance. Math. Program. 108, 177–205 (2006). https://doi.org/10.1007/s10107-006-0706-8
-
Proximal Point Segment Search (Superfast) method
Nesterov, Y. Superfast second-order methods for unconstrained convex optimization. Journal of Optimization Theory and Applications 191, 1–30 (2021). https://doi.org/10.1007/s10957-021-01930-y
-
Basic tensor method (Bregman distance gradient method for p = 3)
Nesterov, Y. Superfast second-order methods for unconstrained convex optimization. Journal of Optimization Theory and Applications 191, 1–30 (2021). https://doi.org/10.1007/s10957-021-01930-y
-
Superfast method
Nesterov, Y. Superfast second-order methods for unconstrained convex optimization. Journal of Optimization Theory and Applications 191, 1–30 (2021). https://doi.org/10.1007/s10957-021-01930-y
-
Hyperfast method
Kamzolov, D. Near-optimal hyperfast second-order method for convex optimization. International Conference on Mathematical Optimization Theory and Operations Research, 167–178 (2020). https://doi.org/10.1007/978-3-030-58657-7_15
-
Optimal tensor method
Kovalev, D., Gasnikov, A. The First Optimal Acceleration of High-Order Methods in Smooth Convex Optimization. arXiv preprint arXiv:2205.09647 (2022). https://arxiv.org/abs/2205.09647
TBA
- Class describing the algorithm (we denote it by
Algorithm
) is derived from torch.optim.optimizer.Optimizer - The paper introducing algorithm and the list of contributors are presented in docstring for
Algorithm
- The only required argument for constructor
Algorithm::__init__
isparams
Algorithm
does not takes themodel
itself in any way, only itsmodel.parameters()
as aparam
argument of constructor. As well asAlgorithm
does not take any information about loss, problem or other entities from outside. In other words, algorithms can use only zero-, first-, second- etc. oracle information provided byclosure
function, described below, or by the content ofgrad
field of parameterp
- All the necessary constants (from Lipschitz, Hölder, or Polyak–Łojasiewicz etc. condition) are the arguments of
Algorithm::__init__
, are provided with reasonable default value (working for the Basic tests) and have corresponding check raisingValueError
if value is incorrect - Constructor
Algorithm::__init__
takes non-required boolean argumentverbose
controlling all the printing in stdout may be produced byAlgorithm
- Overridden method
Algorithm::step
takes one required parameterclosure
, that is the function evaluating loss (with a proper PyTorch forwarding) and that takes non-required boolean argumentbackward
(if it is True,closure
automatically performs backpropagation) - For every
group
inself.param_groups
, commonly used variables (like constants approximations, counters etc.) are stored inself.state[group['params'][0]]
- All the param-specific variables (typically x_k, y_k, z_k sequences) are stored by parts in
self.state[p]
for the correspondingp
elements ofgroup['params']
(note, that it is very undesirable to storage anything inself.state
in a form of List[Tensor] or Dict[Tensor], if it is possible to satisfy the prescribed requirement) - If
Algorithm
requires any additional functions for auxiliary calculations (excluding auxiliary optimization problems in need of iterative gradient-based subsolver), they are provided as a self-sufficient procedures before and outside theAlgorithm
implementation (note, that it is undesirable to use@staticmethod
for this purpose) - Do not contribute several algorithms differing only in the usage of L-adaptivity, restarts procedure etc. If there is the special envelope in package implementing one of this extensions, make your
Algorithm
compatible with it. If there is not, add corresponding non-required boolean argument toAlgorithm::__init__
, controlling their usage. For backwards compatibility, if algorithm supports the compound usage with some envelope, add the corresponding non-required boolean argument anyway with default valueNone
and further check that raisesAttributeError
if value is notNone
- Make sure that
Algorithm
passes Basic tests Algorithm
must have static boolean attributeMONOTONE
indicating whether method guarantees the monotonic decreasing of function value
- Make sure all the methods have clear comments
Algorithm
and override methods should be provided with docstrings in Google Style- Try to use
@torch.no_grad()
annotation instead ofwith torch.no_grad():
when it is possible - Class
Algorithm
should be named after the original name of the optimization algorithm (from its source paper), if it is unique and recognizable enough (like SARAH or Varag), or by the commonly accepted name of approach (like SimilarTriangles). The words "Method" and "Descent" should be omitted. Avoid the ambiguous abbreviations (e.g. use something like InterpolationLearningSGD instead of AMBSGD aka Accelerated Minibatch SGD)
The basic tests are intended to check the correctness of contributed algorithms and benchmark them. These tests are launched automatically after the every update of the main branch of repository, so we guarantee that implemented algorithms are correct and their performance non-decrease with the updates of implementations. Basic tests consist of three groups of tests:
- Unit tests
- Universal tests
- Performance tests
Unit tests are implemented using the python unittest
library, and are provided together with the source code of every algorithm in a distinct file. E.g., if algorithm is implemented in algorithm.py
, unit tests are implemented in test_algorithm.py
in the same directory. We ask contributors to provide their own versions of unit tests for the contributed algorithms. All the unit tests presented in library can be launched manually with a command ./run_unit_tests.sh
.
Universal tests check the expected behaviour and minimal performance requiremences for the algorithms on some toy problems. The main goal of these tests is to check the guarantees provided by the methods and eliminate the divergence of the algorithms. The universal tests are not available on edit for the side contributor, but can be complicated by authors in order to provide some more strong guarantees (for example, by checking the convergence rate on the problems with the known solution). In these cases, some algorithms that did not passed the enhanced tests may be deleted from main branch until the correction (so we recommend to use only release versions of out library as a dependency in your project). All the universal tests presented in library can be launched manually with a command ./run_universal_tests.py
.
Now, the list of the used toy problems is as follows:
- a9a dataset (n = 123, m = 32561)
Copyright © 2020–2022 Dmitry Kamzolov