/T-Fixup

Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"

Primary LanguagePythonMIT LicenseMIT