/fairseq-apollo

FairSeq repo with Apollo optimizer

Primary LanguagePythonMIT LicenseMIT

Fairseq-Apollo

This is the code we used in the following papers. This folder is based on the fairseq package v0.9.0.

Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

Xuezhe Ma

Luna: Linear Unified Nested Attention

Xuezhe Ma, Xiang Kong, Sinong Wang, Chunting Zhou, Jonathan May, Hao Ma, Luke Zettlemoyer

NeurIPS 2021

Examples and pre-trained models

Citation

@article{ma2020apollo,
  title={Apollo: An adaptive parameter-wise diagonal quasi-newton method for nonconvex stochastic optimization},
  author={Ma, Xuezhe},
  journal={arXiv preprint arXiv:2009.13586},
  year={2020}
}

@inproceedings{ma2021luna,
  title={Luna: Linear Unified Nested Attention},
  author={Ma, Xuezhe and Kong, Xiang and Wang, Sinong and Zhou, Chunting and May, Jonathan and Ma, Hao and Zettlemoyer, Luke},
  booktitle = {Advances in Neural Information Processing Systems},
  publisher = {Curran Associates, Inc.},
  year={2021}
}