- Joulin, Armand, et al. "Bag of Tricks for Efficient Text Classification." arXiv preprint arXiv:1607.01759 (2016).
- Hill, Felix, Kyunghyun Cho, and Anna Korhonen. "Learning distributed representations of sentences from unlabelled data." arXiv preprint arXiv:1602.03483 (2016).
- Ghosh, Shalini, et al. "Contextual LSTM (CLSTM) models for Large scale NLP tasks." arXiv preprint arXiv:1602.06291 (2016). [Contextual LSTM]
- Miyamoto, Yasumasa, Kyunghyun Cho. "Gated Word-Character Recurrent Language Model." arXiv preprint arXiv:1606.01700 (2016) [Combine word and character embeddings]
- Jozefowicz, Rafal, et al. "Exploring the limits of language modeling." arXiv preprint arXiv:1602.02410 (2016). [Character, CNN, LSTM]
- Conneau, Alexis, et al. "Very Deep Convolutional Networks for Natural Language Processing." arXiv preprint arXiv:1606.01781 (2016). [Deep CNN]
- Cao, K. and Rei, M. "A Joint Model for Word Embedding and Word Morphology." arXiv preprint arXiv:1606.02601(2016). [Character, LSTM, Morphology]
- Lai, Siwei, et al. "Recurrent Convolutional Neural Networks for Text Classification." AAAI. 2015. [RNN, CNN]
- Tang, Duyu, Bing Qin, and Ting Liu. "Document modeling with gated recurrent neural network for sentiment classification." Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015. [RNN]
- Li, Jiwei, Minh-Thang Luong, and Dan Jurafsky. "A hierarchical neural autoencoder for paragraphs and documents." arXiv preprint arXiv:1506.01057 (2015). [LSTM, Attention]
- Kiros, Ryan, et al. "Skip-thought vectors." Advances in Neural Information Processing Systems. 2015. [RNN, Predict adjacent sentences]
- Dai, Andrew M., and Quoc V. Le. "Semi-supervised sequence learning."Advances in Neural Information Processing Systems. 2015. [Initialized LSTM with a sequence autoencoder]
- Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015). [Character, RNN, CNN, Highway Network]
- Bojanowski, Piotr, Armand Joulin, and Tomas Mikolov. "Alternative structures for character-level RNNs." arXiv preprint arXiv:1511.06303 (2015). [Character, RNN]
- Zhang, Xiang, Junbo Zhao, and Yann LeCun. "Character-level convolutional networks for text classification." Advances in Neural Information Processing Systems. 2015. [Character, CNN]
- Kalchbrenner, Nal, Edward Grefenstette, and Phil Blunsom. "A convolutional neural network for modelling sentences." arXiv preprint arXiv:1404.2188(2014). [CNN]
- Denil, Misha, et al. "Modelling, visualising and summarising documents with a single convolutional neural network." arXiv preprint arXiv:1406.3830 (2014). [CNN]
- Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014). [CNN]
- Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. "Glove: Global Vectors for Word Representation." EMNLP. Vol. 14. 2014.
- Hu, Baotian, et al. "Convolutional neural network architectures for matching natural language sentences." Advances in Neural Information Processing Systems. 2014. [CNN]
- Santos, Cicero D., and Bianca Zadrozny. "Learning character-level representations for part-of-speech tagging." Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014. [Character, CNN]
- Le, Quoc V., and Tomas Mikolov. "Distributed representations of sentences and documents." arXiv preprint arXiv:1405.4053 (2014). [Paragraph Vector]
- Bengio, Y.; Ducharme, R.; and Vincent, P. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3:1137–1155. [NNLM]
- Tomas, Mikolov, and Geoffrey Zweig. "Context dependent recurrent neural network language model." SLT. 2012. [Contextual RNNLM]
- Tomas, Mikolov. Statistical language models based on neural networks. Diss. PhD thesis, Brno University of Technology. 2012.[PDF], 2012. [RNNLM]
- Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).