ELMO + QRNN; Deep contextualized word representations based on QRNN layers. compatible with Tensorflow 2 and google colab.
for a simple classification task which is comment rating a simple global_max_pool layer can achive 78% accuracy with a small size of training data (800 comments) and a one layer of LSTM layer can achive 86% accuracy for a slightly bigger dataset (42000 comments).