Universal Transformer
A PyTorch from-scratch implementation of the vanilla and universal transformers, as proposed by Vaswani et al. (2017) in Attention is All You Need and Dehghani at al. (2018) in Universal Transformers respectively.
A few utility functions are taken from The Annotated Transformer tutorial. Important additions, aside the implementation of the Universal Transformer, include PEP-484 style type annotations, more efficient vectorization, and a batch-vectorized implementation of beam search decoding.
The code has been tested in various sequence tagging tasks where it performs consistently and yields high scores with minimal training times. An example usecase on type-logical sentence supertagging, including training and evaluation scripts, can be found here. A key limitation of the current implementation is the assumption of an equal length between input and output sequences.