/NoTrainNoGain

Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)

Primary LanguagePythonMIT LicenseMIT

No Train No Gain

Code for the paper "No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models"; Jean Kaddour, Oscar Key, Piotr Nawrot, Pasquale Minervini, Matt J. Kusner .

Running the code

See the README for the:

Citation and license

We use two excellent open source codebases to implement our experiments:

  • The BERT experiments are forked of Cramming
  • The T5 experiments are forked of NanoT5

If you find this repository useful, please consider citing both our work and these original codebases.

To cite our work, we suggest the following BibTeX:

@misc{kaddourNoTrainNo2023,
	title = {No {Train} {No} {Gain}: {Revisiting} {Efficient} {Training} {Algorithms} {For} {Transformer}-based {Language} {Models}},
	url = {http://arxiv.org/abs/2307.06440},
	doi = {10.48550/arXiv.2307.06440},
	urldate = {2023-07-17},
	publisher = {arXiv},
	author = {Kaddour, Jean and Key, Oscar and Nawrot, Piotr and Minervini, Pasquale and Kusner, Matt J.},
	month = jul,
	year = {2023},
	note = {arXiv:2307.06440 [cs]},
}

We provide separate licenses for the BERT experiments and the T5 experiments.

Contact

Feel free to open an issue, or email us, with any questions.