/Positive-Negative-Momentum

[ICML 2021] The official PyTorch Implementations of Positive-Negative Momentum Optimizers.

Primary LanguagePythonMIT LicenseMIT

Positive-Negative-Momentum

The official PyTorch Implementations of Positive-Negative Momentum Optimizers.

The algortihms are proposed in our paper: Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization, which is accepted by ICML 2021. In the updated arxiv version, we fixed several notation typos that appeared in the ICML version due to the notation conflicts.

Why Positive-Negative Momentum?

It is well-known that stochastic gradient noise matters a lot to generalization. The Positive-Negative Momentum (PNM) approach, which is a powerful alternative to conventional Momentum in classic optimizers, can manipulate stochastic gradient noise by adjusting the extrahyperparameter.

The environment is as bellow:

Python 3.7.3

PyTorch >= 1.4.0

Usage

#You may use it as a standard PyTorch optimizer.

from pnm_optim import *

PNM_optimizer = PNM(net.parameters(), lr=lr, betas=(0.9, 1.), weight_decay=weight_decay)
AdaPNM_optimizer = AdaPNM(net.parameters(), lr=lr, betas=(0.9, 0.999, 1.), eps=1e-08, weight_decay=weight_decay)

Test performance

PNM versus conventional Momentum. We report the mean and the standard deviations (as the subscripts) of the optimal test errors computed over three runs of each experiment. The proposed PNM-based methods show significantly better generalization than conventional momentum-based methods. Particularly, as the theoretical analysis indicates, Stochastic PNM indeed consistently outperforms the conventional baseline, SGD.

Dataset Model PNM AdaPNM SGD M Adam AMSGrad AdamW AdaBound Padam Yogi RAdam
CIFAR-10 ResNet18 4.480.09 4.940.05 5.010.03 6.530.03 6.160.18 5.080.07 5.650.08 5.120.04 5.870.12 6.010.10
VGG16 6.260.05 5.990.11 6.420.02 7.310.25 7.140.14 6.480.13 6.760.12 6.150.06 6.900.22 6.560.04
CIFAR-100 ResNet34 20.590.29 20.410.18 21.520.37 27.160.55 25.530.19 22.990.40 22.870.13 22.720.10 23.570.12 24.410.40
DenseNet121 19.760.28 20.680.11 19.810.33 25.110.15 24.430.09 21.550.14 22.690.15 21.100.23 22.150.36 22.270.22
GoogLeNet 20.380.31 20.260.21 21.210.29 26.120.33 25.530.17 21.290.17 23.180.31 21.820.17 24.240.16 22.230.15

Citing

If you use Positive-Negative Momentum in your work, please cite

@InProceedings{xie2021positive,
  title = 	 {Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization},
  author =       {Xie, Zeke and Yuan, Li and Zhu, Zhanxing and Sugiyama, Masashi},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {11448--11458},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
}