/DT-Fixup

Optimizing Deeper Transformers on Small Datasets https://arxiv.org/abs/2012.15355

Primary LanguagePython

Watchers