/hessianlearn

Hessian-based stochastic optimization in TensorFlow and keras

Primary LanguagePythonGNU Lesser General Public License v3.0LGPL-3.0

      ___          ___          ___          ___                     ___          ___     
     /__/\        /  /\        /  /\        /  /\       ___         /  /\        /__/\    
     \  \:\      /  /:/_      /  /:/_      /  /:/_     /  /\       /  /::\       \  \:\   
      \__\:\    /  /:/ /\    /  /:/ /\    /  /:/ /\   /  /:/      /  /:/\:\       \  \:\  
  ___ /  /::\  /  /:/ /:/_  /  /:/ /::\  /  /:/ /::\ /__/::\     /  /:/~/::\  _____\__\:\ 
 /__/\  /:/\:\/__/:/ /:/ /\/__/:/ /:/\:\/__/:/ /:/\:\\__\/\:\__ /__/:/ /:/\:\/__/::::::::\
 \  \:\/:/__\/\  \:\/:/ /:/\  \:\/:/~/:/\  \:\/:/~/:/   \  \:\/\\  \:\/:/__\/\  \:\~~\~~\/
  \  \::/      \  \::/ /:/  \  \::/ /:/  \  \::/ /:/     \__\::/ \  \::/      \  \:\  ~~~ 
   \  \:\       \  \:\/:/    \__\/ /:/    \__\/ /:/      /__/:/   \  \:\       \  \:\     
    \  \:\       \  \::/       /__/:/       /__/:/       \__\/     \  \:\       \  \:\    
     \__\/        \__\/        \__\/        \__\/                   \__\/        \__\/    


		                   ___          ___          ___          ___     
		                  /  /\        /  /\        /  /\        /__/\    
		                 /  /:/_      /  /::\      /  /::\       \  \:\   
		  ___     ___   /  /:/ /\    /  /:/\:\    /  /:/\:\       \  \:\  
		 /__/\   /  /\ /  /:/ /:/_  /  /:/~/::\  /  /:/~/:/   _____\__\:\ 
		 \  \:\ /  /://__/:/ /:/ /\/__/:/ /:/\:\/__/:/ /:/___/__/::::::::\
		  \  \:\  /:/ \  \:\/:/ /:/\  \:\/:/__\/\  \:\/:::::/\  \:\~~\~~\/
		   \  \:\/:/   \  \::/ /:/  \  \::/      \  \::/~~~~  \  \:\  ~~~ 
		    \  \::/     \  \:\/:/    \  \:\       \  \:\       \  \:\     
		     \__\/       \  \::/      \  \:\       \  \:\       \  \:\    
		                  \__\/        \__\/        \__\/        \__\/    

Build Status DOI License Top language Code size Issues Latest commit

Hessian-based stochastic optimization in TensorFlow and keras

This code implements Hessian-based stochastic optimization in TensorFlow and keras by exposing the matrix-free Hessian to users. The code is meant to allow for rapid-prototyping of Hessian-based algorithms via the matrix-free Hessian action, which allows users to inspect Hessian based information for stochastic nonconvex (neural network training) optimization problems.

The Hessian action is exposed via matrix-vector products:

and matrix-matrix products:

Compatibility

The code is compatible with Tensorflow v1 and v2, but certain features of v2 are disabled (like eager execution). This is because the Hessian matrix products in hessianlearn are implemented using placeholders which have been deprecated in v2. For this reason hessianlearn cannot work with data generators and things like this that require eager execution. If any compatibility issues are found, please open an issue.

Usage

Set HESSIANLEARN_PATH environmental variable

Train a keras model

import os,sys
import tensorflow as tf
sys.path.append( os.environ.get('HESSIANLEARN_PATH'))
from hessianlearn import *

# Define keras neural network model
neural_network = tf.keras.models.Model(...)
# Define loss function and compile model
neural_network.compile(loss = ...)

hessianlearn implements various training problem constructs (regression, classification, autoencoders, variational autoencoders, generative adversarial networks). Instantiate a problem, a data object (which takes a dictionary with keys that correspond to the corresponding placeholders in problem) and regularization

# Instantiate the problem (this handles the loss function,
# construction of hessian and gradient etc.)
# KerasModelProblem extracts loss function and metrics from
# a compiled keras model
problem = KerasModelProblem(neural_network)
# Instantiate the data object, this handles the train / validation split
# as well as iterating during training
data = Data({problem.x:x_data,problem.y_true:y_data},train_batch_size,\
	validation_data_size = validation_data_size)
# Instantiate the regularization: L2Regularization is Tikhonov,
# gamma = 0 is no regularization
regularization = L2Regularization(problem,gamma = 0)

Pass these objects into the HessianlearnModel which handles the training

HLModel = HessianlearnModel(problem,regularization,data)
HLModel.fit()

Alternative Usage (More like Keras Interface)

The example above was the original way the optimizer interface was implemented in hessianlearn, however to better mimic the keras interface and allow for more end-user rapid prototyping of the optimizer that is used to fit data, as of December 2021, the following way has been created

import os,sys
import tensorflow as tf
sys.path.append( os.environ.get('HESSIANLEARN_PATH'))
from hessianlearn import *

# Define keras neural network model
neural_network = tf.keras.models.Model(...)
# Define loss function and compile model
neural_network.compile(loss = ...)
# Instance keras model wrapper which deals with the 
# construction of the `problem` which handles the construction
# of Hessian computational graph and variables
HLModel = KerasModelWrapper(neural_network)
# Then the end user can pass in an optimizer 
# (e.g. custom end-user optimizer)
optimizer = LowRankSaddleFreeNewton # The class constructor, not an instance
opt_parameters = LowRankSaddleFreeNewtonParameters()
opt_parameters['hessian_low_rank'] = 40
HLModel.set_optimizer(optimizer,optimizer_parameters = opt_parameters)
# The data object still needs to key on to the specific computational
# graph variables that data will be passed in for.
# Note that data can naturally handle multiple input and output data,
# in which case problem.x, problem.y_true are lists corresponding to
# neural_network.inputs, neural_network.outputs
problem = HLModel.problem
data = Data({problem.x:x_data,problem.y_true:y_data},train_batch_size,\
	validation_data_size = validation_data_size)
# And finally one can call fit!
HLModel.fit(data)

Examples

Tutorial 0: MNIST Autoencoder

Applications

Transfer Learning

  • Examples of CIFAR10, CIFAR100 classification from pre-trained Imagenet ResNet50 model in applications/transfer_learning/

  • Pre-trained model serves as well conditioned initial guess for transfer learning. In this setting Newton methods perform well due to their excellent properties in local convergence. Low Rank Saddle Free Newton is able to zero in on highly generalizable local minimizers bypassing indefinite regions. Below are validation accuracies of best choices of fixed step-length for Adam, SGD and LRSFN with fixed rank of 40.

References

These manuscripts motivate and use the hessianlearn library for stochastic nonconvex optimization

  • [1] O'Leary-Roseberry, T., Alger, N., Ghattas O., Inexact Newton Methods for Stochastic Nonconvex Optimization with Applications to Neural Network Training. arXiv:1905.06738. (Download)

    BibTeX
    @article{OLearyRoseberryAlgerGhattas2019,
    title={Inexact Newton methods for stochastic nonconvex optimization with applications to neural network training},
    author={O'Leary-Roseberry, Thomas and Alger, Nick and Ghattas, Omar},
    journal={arXiv preprint arXiv:1905.06738},
    year={2019}
    }
    }

  • [2] O'Leary-Roseberry, T., Alger, N., Ghattas O., Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization. arXiv:2002.02881. (Download)

    BibTeX
    @article{OLearyRoseberryAlgerGhattas2020,
    title={Low Rank Saddle Free Newton: Algorithm and Analysis},
    author={O'Leary-Roseberry, Thomas and Alger, Nick and Ghattas, Omar},
    journal={arXiv preprint arXiv:2002.02881},
    year={2020}
    }
    }

  • [3] O'Leary-Roseberry, T., Villa, U., Chen P., Ghattas O., Derivative-Informed Projected Neural Networks for High-Dimensional Parametric Maps Governed by PDEs. Computer Methods in Applied Mechanics and Engineering. Volume 388, 1 January 2022, 114199. (Download)

    BibTeX
    @article{OLearyRoseberryVillaChenEtAl2022,
    title={Derivative-informed projected neural networks for high-dimensional parametric maps governed by {PDE}s},
    author={O’Leary-Roseberry, Thomas and Villa, Umberto and Chen, Peng and Ghattas, Omar},
    journal={Computer Methods in Applied Mechanics and Engineering},
    volume={388},
    pages={114199},
    year={2022},
    publisher={Elsevier}
    }
    }

  • [4] O'Leary-Roseberry, T., Du, X., Chaudhuri, A., Martins, J., Willcox, K., Ghattas, O., Adaptive Projected Residual Networks for Learning Parametric Maps from Sparse Data. arXiv:2112.07096. (Download)

    BibTeX
    @article{OLearyRoseberryDuChaudhuriEtAl2021,
    title={Adaptive Projected Residual Networks for Learning Parametric Maps from Sparse Data},
    author={O'Leary-Roseberry, Thomas and Du, Xiaosong, and Chaudhuri, Anirban, and Martins Joaqium R. R. A., and Willcox, Karen, and Ghattas, Omar},
    journal={arXiv preprint arXiv:2112.07096},
    year={2021}
    }
    }