Papers and books that I think are important to read if you are doing NLP and Deep Learning
Jurafsky/Martin (
- Third edition is online (
Manning/Schutze (
Accurate Methods for the Statistics of Surprise and Coincidence (Dunning)
A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (Rabiner)
TnT - A Statistical Part of Speech Tagger (Brants)
A Maximum Entropy Approach to Natural Language Processing
A Maximum Entropy Model for Part-Of-Speech Tagging (Ratnaparkhi)
Baselines and Bigrams: Simple, Good Sentiment and Topic Classification (Wang, Manning) (NBSVM)
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods (Yarowksy)
Training Deterministic Parsers with Non-Deterministic Oracles (Goldberg, Nivre)
A Dynamic Oracle for Arc-Eager Dependency Parsing (Goldberg, Nivre)
TextRank: Bringing Order into Texts (Mihalcea, Tarau)
Improving Machine Learning Approaches to Coreference Resolution (Ng, Cardie)
Local and Global Algorithms for Disambiguation to Wikipedia (Ratinov, Roth, Downey, Anderson)
Latent Dirichlet Allocation (Blei, Ng, Jordan)
- Distributed Representations of Words and Phrases and their Compositionality (Mikolov, Sutskever, Chen, Corrado, Dean)
- Exploiting Similarities among Languages for Machine Translation (Mikolov, Le, Sutskever)
- Efficient Estimation of Word Representations in Vector Space (Mikolov, Chen, Corrado, Dean)
- Deep contextualized word representations (Peters et al)
- Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation (Ling et al)
- Natural Language Processing (Almost) from Scratch (Collobert et al)
- Enriching Word Vectors with Subword Information (Bojanowski, Grave, Joulin, Mikolov)
- Recurrent Neural Network Regularization (Zaremba, Sutskever, Vinyals)
- Character-Aware Neural Language Models (Kim, Jernite, Sontag, Rush)
- Exploring the Limits of Language Modeling (Jozefowicz, Vinyals, Schuster, Shazeer, Wu)
- Learning Character-level Representations for Part-of-Speech Tagging (dos Santos, Zadrozny)
- Boosting Named Entity Recognition with Neural Character Embeddings (dos Santos, Cıcero and Victor Guimaraes)
- Neural Architectures for Named Entity Recognition (Lample et al)
- End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (Ma, Hovy)
- Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging (Reimers, Gurevych)
- Sequence to Sequence Learning with Neural Networks (Sutskever, Vinyals, Le)
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (Cho et al)
- Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau, Cho, Bengio)
- Attention Is All You Need (Vaswani et al)
- Show and Tell: A Neural Image Caption Generator (Vinyals, Tosheb, Bengio, Erhan)
- Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models (Serban, Sordoni, Bengio, Courville, Pineau)
- End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning (Williams, Zweig)
- Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning (Williams, Asadi, Zweig)
- Learning End-to-End Goal-oriented Dialog (Bordes, Boureau, Weston)
- A Neural Conversation Model (Vinyals, Le)
- Convolutional Neural Networks for Sentence Classification (Kim)
- Rethinking the Inception Architecture for Computer Vision (Szegedy)
- Going Deeper with Convolutions (Szegedy et al)
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (Ioffe/Szegedy)
- Hierarchical Attention Networks for Document Classification (Yanh et al)
- Deep Residual Learning for Image Recognition (He, Zhang, Ren, Sun)
Hugo LaRochelle's Neural Networks course
- Hinton course
- Ng course
Andrew Gibiansky's Blog
Understanding LSTM Networks
The Unreasonable Effectiveness of Recurrent Neural Networks (Karpathy)