
A collection/reading-list of awesome Natural Language Processing papers sorted by date.

  • Unsupervised Machine Translation Using Monolingual Corpora Only, Lample et al. Paper

  • On the Dimensionality of Word Embeddings, Yin et al. Paper

  • An efficient framework for learning sentence representations, Logeswaran et al. Paper

  • Refining Pretrained Word Embeddings Using Layer-wise Relevance Propagation, Akira Utsumi Paper

  • Domain Adapted Word Embeddings for Improved Sentiment Classification, Sarma et al. Paper

  • In-domain Context-aware Token Embeddings Improve Biomedical Named Entity Recognition, Sheikhshab et al. Paper

  • Generalizing Word Embeddings using Bag of Subwords, Zhao et al. Paper

  • What's in Your Embedding, And How It Predicts Task Performance, Rogers et al. Paper

  • On Learning Better Word Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data Wang et al. Paper

  • Predicting and interpreting embeddings for out of vocabulary words in downstream tasks, Garneau et al. Paper

  • Addressing Low-Resource Scenarios with Character-aware Embeddings, Papay et al. Paper

  • Domain Adaptation for Disease Phrase Matching with Adversarial Networks, Liu et al. Paper

  • Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus, Komiya et al. Paper

  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al. Paper

  • Adapting Word Embeddings from Multiple Domains to Symptom Recognition from Psychiatric Notes, Zhang et al. Paper

  • Evaluation of sentence embeddings in downstream and linguistic probing tasks, Perone et al. Paper

  • Universal Sentence Encoder, Cer et al. Paper

  • Deep Contextualized Word Representations, Peters et al. Paper

  • Learned in Translation: Contextualized Word Vectors, McCann et al. Paper

  • Concatenated p-mean Word Embeddings as Universal Cross-Lingual Sentence Representations, Rücklé et al. paper

  • A Compressed Sensing View of Unsupervised Text Embeddings, Bag-Of-n-Grams, and LSTMs, Arora et al. Paper


  • Attention Is All You Need, Vaswani et al. Paper

  • Skip-Gram – Zipf + Uniform = Vector Additivity, Gittens et al. Paper

  • A Simple but Tough-to-beat Baseline for Sentence Embeddings, Arora et al. Paper

  • Fast and Accurate Entity Recognition with Iterated Dilated Convolutions, Strubell et al. Paper

  • Advances in Pre-Training Distributed Word Representations, Mikolov et al. Paper

  • Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets, Dror et al. Paper


  • Towards Universal Paraphrastic Sentence Embeddings, Wieting et al. Paper

  • Bag of Tricks for Efficient Text Classification, Joulin et al. Paper

  • Enriching Word Vectors with Subword Information, Bojanowski et al. Paper

  • Assessing the Corpus Size vs. Similarity Trade-off for Word Embeddings in Clinical NLP, Kirk Roberts Paper

  • How to Train Good Word Embeddings for Biomedical NLP, Chiu et al. Paper

  • Log-Linear Models, MEMMs, and CRFs, Michael Collins Paper

  • Counter-fitting Word Vectors to Linguistic Constraints, Mrkšić et al. Paper

  • Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, Wu et al. Paper


  • Semi-supervised Sequence Learning, Dai et al. Paper

  • Evaluating distributed word representations for capturing semantics of biomedical concepts, Th et al. Paper


  • GloVe: Global Vectors for Word Representation, Pennington et al. Paper

  • Linguistic Regularities in Sparse and Explicit Word Representations, Levy and Goldberg. Paper

  • Neural Word Embedding as Implicit Matrix Factorization, Levy and Goldberg. Paper

  • word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method, Goldberg and Levy. Paper

  • What’s in a p-value in NLP?, Søgaard et al. Paper

  • How transferable are features in deep neural networks?, Yosinski et al. Paper

  • Improving lexical embeddings with semantic knowledge, Yu et al. Paper

  • Retrofitting word vectors to semantic lexicons, Faruqui et al. Paper


  • Efficient Estimation of Word Representations in Vector Space, Mikolov et al. Paper

  • Linguistic Regularities in Continuous Space Word Representations, Mikolov et al. Paper

  • Distributed Representations of Words and Phrases and their Compositionality, Mikolov et al. Paper


  • An Empirical Investigation of Statistical Significance in NLP, Berg-Kirkpatrick et al. Paper


  • Word representations: A simple and general method for semi-supervised learning, Turian et al. Paper


  • A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning, Collobert and Weston. Paper


  • Domain adaptation with structural correspondence learning, Blitzer et al. Paper


  • A Neural Probabilistic Language Model, Bengio et al. Paper


  • Distributed Representations, Hinton et al. Paper