Positive-Negative-Momentum

The official PyTorch Implementations of Positive-Negative Momentum Optimizers.

The algortihms are proposed in our paper: Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization, which is accepted by ICML 2021. In the updated arxiv version, we fixed several notation typos that appeared in the ICML version due to the notation conflicts.

Why Positive-Negative Momentum?

It is well-known that stochastic gradient noise matters a lot to generalization. The Positive-Negative Momentum (PNM) approach, which is a powerful alternative to conventional Momentum in classic optimizers, can manipulate stochastic gradient noise by adjusting the extrahyperparameter.

The environment is as bellow:

Python 3.7.3

PyTorch >= 1.4.0

Usage

#You may use it as a standard PyTorch optimizer.

from pnm_optim import *

PNM_optimizer = PNM(net.parameters(), lr=lr, betas=(0.9, 1.), weight_decay=weight_decay)
AdaPNM_optimizer = AdaPNM(net.parameters(), lr=lr, betas=(0.9, 0.999, 1.), eps=1e-08, weight_decay=weight_decay)

Test performance

PNM versus conventional Momentum. We report the mean and the standard deviations (as the subscripts) of the optimal test errors computed over three runs of each experiment. The proposed PNM-based methods show significantly better generalization than conventional momentum-based methods. Particularly, as the theoretical analysis indicates, Stochastic PNM indeed consistently outperforms the conventional baseline, SGD.

Dataset	Model	PNM	AdaPNM	SGD M	Adam	AMSGrad	AdamW	AdaBound	Padam	Yogi	RAdam
CIFAR-10	ResNet18	4.48_0.09	4.94_0.05	5.01_0.03	6.53_0.03	6.16_0.18	5.08_0.07	5.65_0.08	5.12_0.04	5.87_0.12	6.01_0.10
	VGG16	6.26_0.05	5.99_0.11	6.42_0.02	7.31_0.25	7.14_0.14	6.48_0.13	6.76_0.12	6.15_0.06	6.90_0.22	6.56_0.04
CIFAR-100	ResNet34	20.59_0.29	20.41_0.18	21.52_0.37	27.16_0.55	25.53_0.19	22.99_0.40	22.87_0.13	22.72_0.10	23.57_0.12	24.41_0.40
	DenseNet121	19.76_0.28	20.68_0.11	19.81_0.33	25.11_0.15	24.43_0.09	21.55_0.14	22.69_0.15	21.10_0.23	22.15_0.36	22.27_0.22
	GoogLeNet	20.38_0.31	20.26_0.21	21.21_0.29	26.12_0.33	25.53_0.17	21.29_0.17	23.18_0.31	21.82_0.17	24.24_0.16	22.23_0.15

Citing

If you use Positive-Negative Momentum in your work, please cite