Papers and books that I think are important to read if you are doing NLP and Deep Learning
-
Jurafsky/Martin (https://web.stanford.edu/~jurafsky/slp3/)
- Third edition is online (https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf)
-
Manning/Schutze (https://nlp.stanford.edu/fsnlp/)
-
Accurate Methods for the Statistics of Surprise and Coincidence (Dunning)
-
A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (Rabiner)
-
TnT - A Statistical Part of Speech Tagger (Brants)
-
A Maximum Entropy Approach to Natural Language Processing
-
A Maximum Entropy Model for Part-Of-Speech Tagging (Ratnaparkhi)
-
Baselines and Bigrams: Simple, Good Sentiment and Topic Classification (Wang, Manning) (NBSVM)
-
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods (Yarowksy)
-
Training Deterministic Parsers with Non-Deterministic Oracles (Goldberg, Nivre)
-
A Dynamic Oracle for Arc-Eager Dependency Parsing (Goldberg, Nivre)
-
TextRank: Bringing Order into Texts (Mihalcea, Tarau)
-
Improving Machine Learning Approaches to Coreference Resolution (Ng, Cardie)
-
Local and Global Algorithms for Disambiguation to Wikipedia (Ratinov, Roth, Downey, Anderson)
-
Latent Dirichlet Allocation (Blei, Ng, Jordan)
- Distributed Representations of Words and Phrases and their Compositionality (Mikolov, Sutskever, Chen, Corrado, Dean)
- Exploiting Similarities among Languages for Machine Translation (Mikolov, Le, Sutskever)
- Efficient Estimation of Word Representations in Vector Space (Mikolov, Chen, Corrado, Dean)
- Deep contextualized word representations (Peters et al)
- Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation (Ling et al)
- Natural Language Processing (Almost) from Scratch (Collobert et al)
- Enriching Word Vectors with Subword Information (Bojanowski, Grave, Joulin, Mikolov)
- Recurrent Neural Network Regularization (Zaremba, Sutskever, Vinyals)
- Character-Aware Neural Language Models (Kim, Jernite, Sontag, Rush)
- Exploring the Limits of Language Modeling (Jozefowicz, Vinyals, Schuster, Shazeer, Wu)
- Learning Character-level Representations for Part-of-Speech Tagging (dos Santos, Zadrozny)
- Boosting Named Entity Recognition with Neural Character Embeddings (dos Santos, Cıcero and Victor Guimaraes)
- Neural Architectures for Named Entity Recognition (Lample et al)
- End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (Ma, Hovy)
- Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging (Reimers, Gurevych)
- Sequence to Sequence Learning with Neural Networks (Sutskever, Vinyals, Le)
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (Cho et al)
- Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau, Cho, Bengio)
- Attention Is All You Need (Vaswani et al)
- Show and Tell: A Neural Image Caption Generator (Vinyals, Tosheb, Bengio, Erhan)
- Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models (Serban, Sordoni, Bengio, Courville, Pineau)
- End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning (Williams, Zweig)
- Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning (Williams, Asadi, Zweig)
- Learning End-to-End Goal-oriented Dialog (Bordes, Boureau, Weston)
- A Neural Conversation Model (Vinyals, Le)
- Convolutional Neural Networks for Sentence Classification (Kim)
- Rethinking the Inception Architecture for Computer Vision (Szegedy)
- Going Deeper with Convolutions (Szegedy et al)
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (Ioffe/Szegedy)
- Hierarchical Attention Networks for Document Classification (Yanh et al)
- Deep Residual Learning for Image Recognition (He, Zhang, Ren, Sun)
-
Hugo LaRochelle's Neural Networks course
-
Coursera
- Hinton course
- Ng course
-
Andrew Gibiansky's Blog
-
Understanding LSTM Networks
-
The Unreasonable Effectiveness of Recurrent Neural Networks (Karpathy)