/transformers_without_tears

Transformers without Tears: Improving the Normalization of Self-Attention

Primary LanguagePythonMIT LicenseMIT