/NoTrainNoGain

Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)

Primary LanguagePythonMIT LicenseMIT

Watchers