/prodigy

The Prodigy optimizer and its variants for training neural networks.

Primary LanguagePythonMIT LicenseMIT

Prodigy: An Expeditiously Adaptive Parameter-Free Learner

Downloads Downloads

This is the official repository used to run the experiments in the paper that proposed the Prodigy optimizer. Currently, the code is only in PyTorch.

Prodigy: An Expeditiously Adaptive Parameter-Free Learner
K. Mishchenko, A. Defazio
Paper: https://arxiv.org/pdf/2306.06101.pdf

Installation and use

To install the package simply run, pip install prodigyopt Let net be the neural network you want to train. Then, you can use the method as follows:

from prodigyopt import Prodigy
# you can choose weight decay value based on your problem, 0 by default
opt = Prodigy(net.parameters(), lr=1., weight_decay=weight_decay)

Note that by default, Prodigy uses weight decay as in AdamW. If you want it to use standard $\ell_2$ regularization instead, use option decouple=False.

We also recommend using cosine annealing with the method:

# n_epoch is the total number of epochs to train the network
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=n_epoch)

Extra care should be taken if you use linear warm-up at the beginning. The method will see slow progress due to the initially small base learning rate, so it might overestimate d because of this. To avoid issues with warm-up, use option safeguard_warmup=True.
Based on the interaction with some of the users, we recommend setting safeguard_warmup=True, use_bias_correction=True, and weight_decay=0.01 when training diffusion models.

See this Google Colab for a toy example of how one can use Prodigy to train ResNet-18 on Cifar10 (test accuracy 80% after 20 epochs).

How to cite

If you find our work useful, please consider citing our paper.

@article{mishchenko2023prodigy,
    title={Prodigy: An Expeditiously Adaptive Parameter-Free Learner},
    author={Mishchenko, Konstantin and Defazio, Aaron},
    journal={arXiv preprint arXiv:2306.06101},
    year={2023},
    url={https://arxiv.org/pdf/2306.06101.pdf}
}