pydrsom
is a PyTorch implementation of DRSOM (Dimension-Reduced Second-Order Method).
- See quickstart for an example.
- The
demo
included in this repo with a test on CIFAR10 is due to adabound
- PYDRSOM is developed in
Python 3.8 (torch=1.11.0)
. It is easy to setup, see requirements for dependencies. - The DRSOM optimizer class provides a couple parameters, see the docstring for details drsom.py
- generally, you only have to choose which type of trust-region to use by arg
option_tr
:
# option of trust-region, I or G? # - if 'a'; G = eye(2) # - if 'p'; G = [-g d]'[-g d]
- generally, you only have to choose which type of trust-region to use by arg
For usage, start at the root of this repo:
python quickstart.py -h
For example,
- use a simple network to train Fashion-MNIST
python quickstart.py --optim drsom
- or choose a CNN model
python quickstart.py --optim drsom --model cnn
If you want to see very detailed logs for DRSOM (which by default is turned off), try:
export DRSOM_VERBOSE=1; python quickstart.py --optim drsom --model cnn
Then you can see the inner interation information for each "mini-batch", e.g.,
+----+-----+-------------------+----------+-------+--------+-------+-------+-------+-------+---------+-------+-------+------+--------+------+
| | 𝜆 | Q/c/G | a | ghg | ghg- | dQ | df | rho | acc | acc-𝜆 | 𝛄 | 𝛄- | f | k | k0 |
+====+=====+===================+==========+=======+========+=======+=======+=======+=======+=========+=======+=======+======+========+======+
| 0 | 0 | [[ 3.046 0. ] | [[0.58] | 3.05 | 3.05 | 0.512 | 0.498 | 0.973 | 1 | 1 | 1e-12 | 1e-06 | 2.31 | +0 | 1 |
| | | [ 0. 0. ] | [0. ]] | | | | | | | | | | | | |
| | | [-1.766 0. ] | | | | | | | | | | | | | |
| | | [ 1.766 0. ] | | | | | | | | | | | | | |
| | | [ 0. 0. ]] | | | | | | | | | | | | | |
+----+-----+-------------------+----------+-------+--------+-------+-------+-------+-------+---------+-------+-------+------+--------+------+
Some description:
-
$\lambda, Q, c, G, a (\alpha), f, rho (\rho)$ correspond to the definition in the paper. -
$k, k0$ are total iteration # and inner iteration (trust-region) #, respectively. -
$dQ, df$ are model reduction and actual reduction, respectively. then you can find the value for rho$(\rho)$ -
$\gamma, \gamma-$ are current and last value for$\gamma_k$ , respectively.
We also provide a preliminary script for CIFAR10. Please refer to the code: demos/cifar10/main.py
. This script is based on the training script of adabound.
For usage, start at the root of this repo:
python -u -m demos.cifar10.main -h
A example run:
python -u -m demos.cifar10.main \
--model resnet18 --optim drsom --epoch 50 --option_tr p --gamma_power 1e3
DRSOM.jl
is still under active development. Please add issues on GitHub.
pydrsom is licensed under the MIT License. Check LICENSE
for more details
- Special thanks go to the COPT team and Tianyi Lin (Darren) for helpful suggestions.
You are welcome to cite our paper :), see
@misc{zhang2022drsom,
title={DRSOM: A Dimension Reduced Second-Order Method and Preliminary Analyses},
author={Chuwen Zhang and Dongdong Ge and Bo Jiang and Yinyu Ye},
year={2022},
eprint={2208.00208},
archivePrefix={arXiv},
primaryClass={math.OC}
}