PYDRSOM: A PyTorch version for DRSOM

pydrsom is a PyTorch implementation of DRSOM (Dimension-Reduced Second-Order Method).

See quickstart for an example.
The demo included in this repo with a test on CIFAR10 is due to adabound

Getting started

PYDRSOM is developed in Python 3.8 (torch=1.11.0). It is easy to setup, see requirements for dependencies.
The DRSOM optimizer class provides a couple parameters, see the docstring for details drsom.py
- generally, you only have to choose which type of trust-region to use by arg option_tr:
```
# option of trust-region, I or G?
#  - if 'a'; G = eye(2)
#  - if 'p'; G = [-g d]'[-g d]
```

Fashion-MNIST

For usage, start at the root of this repo:

python quickstart.py -h

For example,

use a simple network to train Fashion-MNIST

python quickstart.py --optim drsom

or choose a CNN model

python quickstart.py --optim drsom --model cnn

Adjust verbosity

If you want to see very detailed logs for DRSOM (which by default is turned off), try:

export DRSOM_VERBOSE=1; python quickstart.py --optim drsom --model cnn

Then you can see the inner interation information for each "mini-batch", e.g.,

+----+-----+-------------------+----------+-------+--------+-------+-------+-------+-------+---------+-------+-------+------+--------+------+
|    |   𝜆 | Q/c/G             | a        |   ghg |   ghg- |    dQ |    df |   rho |   acc |   acc-𝜆 |     𝛄 |    𝛄- |    f |      k |   k0 |
+====+=====+===================+==========+=======+========+=======+=======+=======+=======+=========+=======+=======+======+========+======+
|  0 |   0 | [[ 3.046  0.   ]  | [[0.58]  |  3.05 |   3.05 | 0.512 | 0.498 | 0.973 |     1 |       1 | 1e-12 | 1e-06 | 2.31 |     +0 |    1 |
|    |     |  [ 0.     0.   ]  |  [0.  ]] |       |        |       |       |       |       |         |       |       |      |        |      |
|    |     |  [-1.766  0.   ]  |          |       |        |       |       |       |       |         |       |       |      |        |      |
|    |     |  [ 1.766  0.   ]  |          |       |        |       |       |       |       |         |       |       |      |        |      |
|    |     |  [ 0.     0.   ]] |          |       |        |       |       |       |       |         |       |       |      |        |      |
+----+-----+-------------------+----------+-------+--------+-------+-------+-------+-------+---------+-------+-------+------+--------+------+

Some description:

$\lambda, Q, c, G, a (\alpha), f, rho (\rho)$ correspond to the definition in the paper.
$k, k0$ are total iteration # and inner iteration (trust-region) #, respectively.
$dQ, df$ are model reduction and actual reduction, respectively. then you can find the value for rho $(\rho)$
$\gamma, \gamma-$ are current and last value for $\gamma_k$, respectively.

CIFAR10

We also provide a preliminary script for CIFAR10. Please refer to the code: demos/cifar10/main.py. This script is based on the training script of adabound.

For usage, start at the root of this repo:

python -u -m demos.cifar10.main -h

A example run:

python -u -m demos.cifar10.main \
  --model resnet18 --optim drsom --epoch 50 --option_tr p --gamma_power 1e3

Known issues

DRSOM.jl is still under active development. Please add issues on GitHub.

License

pydrsom is licensed under the MIT License. Check LICENSE for more details

Acknowledgment

Special thanks go to the COPT team and Tianyi Lin (Darren) for helpful suggestions.

Reference

You are welcome to cite our paper :), see

@misc{zhang2022drsom,
      title={DRSOM: A Dimension Reduced Second-Order Method and Preliminary Analyses}, 
      author={Chuwen Zhang and Dongdong Ge and Bo Jiang and Yinyu Ye},
      year={2022},
      eprint={2208.00208},
      archivePrefix={arXiv},
      primaryClass={math.OC}
}

bzhangcw/pydrsom