Investigation into Transformer self-attention building blocks, and the effects of pretraining.
Primary LanguagePython