An implementation of the transformer architecture, loosely based on [1], aptly called Megatron. Megatron uses Self-Attention, Layer Normalization, Residual Connections, and Multi-Layer Perceptrons. Built using this spectacular video. Trained on [2].
A demo is available here.
The best performing model is included under Megatron-9M.pt
[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp 6000-6010. (2017)
[2] Chia-Lun Yeh, Babak Loni, Mariëlle Hendriks, Henrike Reinhardt and Anne Schuth: DpgMedia2019: A Dutch News Dataset for Partisanship Detection. arXiv: arXiv:1908.02322 [cs]