/expectigrad

A deep learning optimizer with reliable convergence. Supports Pytorch and TensorFlow 1 & 2.

Primary LanguagePythonMIT LicenseMIT

Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties

pypi license python

pytorch tensorflow1 tensorflow2

Expectigrad is a first-order stochastic optimization method that fixes the known divergence issue of Adam, RMSProp, and related adaptive methods while offering better performance on well-known deep learning benchmarks.

Expectigrad introduces two innovations to adaptive gradient methods:

  • Arithmetic RMS: Computes the true RMS instead of an exponential moving average (EMA). This makes Expectigrad more robust to divergence and, in theory, less susceptible to gradient noise.
  • Outer momentum: Applies momentum after adapting the step sizes, not before. This reduces bias in the updates by preserving the superposition property.

See the paper for more details.

Pytorch, TensorFlow 1.x, and TensorFlow 2.x are all supported. See installation and usage below to get started.

Pseudocode

equation
equation
equation
equation
equation
  equation
  equation
  equation
  equation
  equation
equation
eqaution

Citing

If you use this code for published work, please cite the original paper:

@article{daley2020expectigrad,
  title={Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties},
  author={Daley, Brett and Amato, Christopher},
  journal={arXiv preprint arXiv:2010.01356},
  year={2020}
}

Installation

Use pip to quickly install Expectigrad:

pip install expectigrad

Or you can clone this repository and install manually:

git clone https://github.com/brett-daley/expectigrad.git
cd expectigrad
python setup.py -e .

Usage

Pytorch and both versions of TensorFlow are supported. Refer to the code snippets below to instantiate the optimizer for your deep learning framework.

Pytorch

import expectigrad

expectigrad.pytorch.Expectigrad(
    params, lr=0.001, beta=0.9, eps=1e-8, sparse_counter=True
)
Args
params (iterable) Iterable of parameters to optimize or dicts defining parameter groups.
lr (float) The learning rate, a scale factor applied to each optimizer step. Default: 0.001
beta (float) The decay rate for Expectigrad's bias-corrected, "outer" momentum. Must be in the interval [0, 1). Default: 0.9
eps (float) A small constant added to the denominator for numerical stability. Must be greater than 0. Default: 1e-8
sparse_counter (bool) If True, Expectigrad's counter increments only where the gradient is nonzero. If False, the counter increments unconditionally. Default: True

Tensorflow 1.x

import expectigrad

expectigrad.tensorflow1.ExpectigradOptimizer(
    learning_rate=0.001, beta=0.9, epsilon=1e-8, sparse_counter=True,
    use_locking=False, name='Expectigrad'
)
Args
learning_rate The learning rate, a scale factor applied to each optimizer step. Can be a float, tf.keras.optimizers.schedules.LearningRateSchedule, Tensor, or callable that takes no arguments and returns the value to use. Default: 0.001
beta (float) The decay rate for Expectigrad's bias-corrected, "outer" momentum. Must be in the interval [0, 1). Default: 0.9
epsilon (float) A small constant added to the denominator for numerical stability. Must be greater than 0. Default: 1e-8
sparse_counter (bool) If True, Expectigrad's counter increments only where the gradient is nonzero. If False, the counter increments unconditionally. Default: True
use_locking (bool) If True, apply use locks to prevent concurrent updates to variables. Default: False
name (str) Optional name for the operations created when applying gradients. Default: 'Expectigrad'

Tensorflow 2.x

import expectigrad

expectigrad.tensorflow2.Expectigrad(
    learning_rate=0.001, beta=0.9, epsilon=1e-8, name='Expectigrad', **kwargs
)
Args
learning_rate The learning rate, a scale factor applied to each optimizer step. Can be a float, tf.keras.optimizers.schedules.LearningRateSchedule, Tensor, or callable that takes no arguments and returns the value to use. Default: 0.001
beta (float) The decay rate for Expectigrad's bias-corrected, "outer" momentum. Must be in the interval [0, 1). Default: 0.9
epsilon (float) A small constant added to the denominator for numerical stability. Must be greater than 0. Default: 1e-8
sparse_counter (bool) If True, Expectigrad's counter increments only where the gradient is nonzero. If False, the counter increments unconditionally. Default: True
name (str) Optional name for the operations created when applying gradients. Default: 'Expectigrad'
**kwargs Keyword arguments. Allowed to be {clipnorm, clipvalue, lr, decay}. clipnorm is gradient clipping by norm; clipvalue is gradient clipping by value; decay is included for backward compatibility to allow time inverse decay of learning rate; lr is included for backward compatibility, recommended to use learning_rate instead.