/adahessian

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Introduction

Block

AdaHessian is a second order based optimizer for the neural network training based on PyTorch. The library supports the training of convolutional neural networks (image_classification) and transformer-based models (transformer).

Please see this paper for more details on the AdaHessian algorithm.

Usage

Please first clone the AdaHessian library to your local system:

git clone https://github.com/amirgholami/adahessian.git

After cloning, please enter either image_classification or transformer folder for further information.

Citation

AdaHessian has been developed as part of the following paper. We appreciate it if you would please cite the following paper if you found the library useful for your work:

@article{yao2020adahessian,
  title={ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning},
  author={Yao, Zhewei and Gholami, Amir and Shen, Sheng and Keutzer, Kurt and Mahoney, Michael W},
  journal={arXiv preprint arXiv:2006.00719},
  year={2020}
}