megatron: A Python repository from uknowwho

An implementation of the transformer architecture, loosely based on [1], aptly called Megatron. Megatron uses Self-Attention, Layer Normalization, Residual Connections, and Multi-Layer Perceptrons. Built using this spectacular video. Trained on [2].

A demo is available here.

The best performing model is included under Megatron-9M.pt

[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp 6000-6010. (2017)

[2] Chia-Lun Yeh, Babak Loni, Mariëlle Hendriks, Henrike Reinhardt and Anne Schuth: DpgMedia2019: A Dutch News Dataset for Partisanship Detection. arXiv: arXiv:1908.02322 [cs]

uknowwho/megatron