Transformers without Tears: Improving the Normalization of Self-Attention
Primary LanguagePythonMIT LicenseMIT