/Crowded-Valley---Results

This repository contains the results for the paper: "Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers"

Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers

Robin M. Schmidt, Frank Schneider, and Philipp Hennig

Paper: [ICML 2021]

Abstract: Choosing the optimizer is considered to be among the most crucial design decisions in deep learning, and it is not an easy one. The growing literature now lists hundreds of optimization methods. In the absence of clear theoretical guidance and conclusive empirical evidence, the decision is often made based on anecdotes. In this work, we aim to replace these anecdotes, if not with a conclusive ranking, then at least with evidence-backed heuristics. To do so, we perform an extensive, standardized benchmark of fifteen particularly popular deep learning optimizers while giving a concise overview of the wide range of possible choices. Analyzing more than 50,000 individual runs, we contribute the following three points: (i) Optimizer performance varies greatly across tasks. (ii) We observe that evaluating multiple optimizers with default parameters works approximately as well as tuning the hyperparameters of a single, fixed optimizer. (iii) While we cannot discern an optimization method clearly dominating across all tested tasks, we identify a significantly reduced subset of specific optimizers and parameter choices that generally lead to competitive results in our experiments: Adam remains a strong contender, with newer methods failing to significantly and consistently outperform it. Our open-sourced results are available as challenging and well-tuned baselines for more meaningful evaluations of novel optimization methods without requiring any further computational efforts.

Results

This repository provides the full log files of all our benchmarking results. They are organized into:

  • Main results: Comparison of fifteen popular deep learning optimizers on eight problems, using four different tuning budgets and four different learning rate schedules. The main results amount to more than 50,000 individual runs.
  • Tuning validation: In order to test the stability of our benchmark, we re-tuned two optimizers (RMSProp and AdaDelta) on all problems a second time. The results of this evaluation are shown in Appendix D in our paper.
  • Seed robustness: An optimizer's performance can be sensitive to the random seed. In this analysis, we performed an extensive grid search on the learning rate for SGD, using ten different seeds throughout. This identifies a “danger zone” of learning rates, that can be sensitive to the random seed. The full analysis can be found in Appendix C of our paper.

Everyone is invited to use those results, for example, to evaluate the performance of newly developed optimizers or as training data for meta-learned optimizers.

We are happy to extend our results with additional optimizers, provided they are generated using the same process described in the paper, to assure a fair comparison.

Below we highlight two figures summarizing our evaluation of those results. For the full evaluation with all our plots and explanations, please check out our paper.

Parallel Coordinates Plot

Heatmap

List of Optimizers

Our paper lists over hundred optimizers that have been proposed for deep learning applications. Here, we keep a digital version that gets updated regularly. If you want to add a method or find an out-of-date reference, please feel free to open a pull request.

Citation

If you use our results, please consider citing:

Robin M. Schmidt, Frank Schneider, Philipp Hennig
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
ICML 2021

@InProceedings{pmlr-v139-schmidt21a,
  title = 	 {Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers},
  author =       {Schmidt, Robin M and Schneider, Frank and Hennig, Philipp},
  booktitle = 	 {Proceedings of the 38th International Conference on Machine Learning},
  pages = 	 {9367--9376},
  year = 	 {2021},
  editor = 	 {Meila, Marina and Zhang, Tong},
  volume = 	 {139},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {18--24 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v139/schmidt21a/schmidt21a.pdf},
  url = 	 {http://proceedings.mlr.press/v139/schmidt21a.html}
}

Contact

Should you have any questions, please feel free to contact the authors.