/AMSGrad-Tensorflow

Simple Tensorflow implementation of AMSGrad Optimizer

Primary LanguagePythonMIT LicenseMIT

AMSGrad-Tensorflow

Simple Tensorflow implementation of AMSGrad

Hyperparameter

  • For the default hyperparameter, we set it to the best value in the experiment
  • learning_rate = 0.01
  • beta1 = 0.9
  • beta2 = 0.99
  • Depending on which network you are using, performance may be good at beta2 = 0.999 (default)

Usage

  from AMSGrad import AMSGrad
  
  train_op = AMSGrad(learning_rate=0.01, beta1=0.9, beta2=0.99, epsilon=1e-8).minimize(loss)

Network Architecture

  x = fully_connected(inputs=images, units=100)
  x = relu(x)
  logits = fully_connected(inputs=x, units=10)

Mnist Result (iteration = 3M)

lr=0.1, beta1=0.9, beta2=various

 

lr=0.01, beta1=0.9, beta2=various

 

Reference

Author

Junho Kim