Code for the paper "No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models"; Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner .
See the README for the:
We use two excellent open source codebases to implement our experiments:
If you find this repository useful, please consider citing both our work and these original codebases.
To cite our work, we suggest the following BibTeX:
@misc{kaddourNoTrainNo2023,
title = {No {Train} {No} {Gain}: {Revisiting} {Efficient} {Training} {Algorithms} {For} {Transformer}-based {Language} {Models}},
url = {http://arxiv.org/abs/2307.06440},
doi = {10.48550/arXiv.2307.06440},
urldate = {2023-07-17},
publisher = {arXiv},
author = {Kaddour, Jean and Key, Oscar and Nawrot, Piotr and Minervini, Pasquale and Kusner, Matt J.},
month = jul,
year = {2023},
note = {arXiv:2307.06440 [cs]},
}
We provide separate licenses for the BERT experiments and the T5 experiments.
Feel free to open an issue, or email us, with any questions.