Deep-Transformer-PaperList

As Transformer has become a dominant architecture in NLP, how to increase the capacity of Transformer and make it trainable is a recent research interest. This repo focuses on papers related to training a deep Transformer.

PaperList

Training Deeper Neural Machine Translation Models with Transparent Attention [ACL18]
Learning Deep Transformer Models for Machine Translation [ACL19]
Fixup Initialization: Residual Learning Without Normalization [ICLR19]
Depth Growing for Neural Machine Translation [ACL19]
Improving Transformer Optimization Through Better Initialization[ICML20]
Multiscale Collaborative Deep Models for Neural Machine Translation [ACL20]
Lipschitz Constrained Parameter Initialization for Deep Transformers [ACL20]
Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention [ACL20]
ReZero is All You Need: Fast Convergence at Large Depth [arxiv]
Understanding the Difficulty of Training Transformers [arxiv]
Very Deep Transformers for Neural Machine Translation [arxiv]

linzehui/Deep-Transformer-PaperList

Deep-Transformer-PaperList

PaperList