README

Papers and books that I think are important to read if you are doing NLP and Deep Learning

Books

I strongly recommend reading at least the first book

Jurafsky/Martin (https://web.stanford.edu/~jurafsky/slp3/)
- Third edition is online (https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf)
Manning/Schutze (https://nlp.stanford.edu/fsnlp/)

Papers for NLP

Accurate Methods for the Statistics of Surprise and Coincidence (Dunning)
- http://www.aclweb.org/anthology/J93-1003
A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (Rabiner)
- http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf
TnT - A Statistical Part of Speech Tagger (Brants)
- http://www.coli.uni-saarland.de/~thorsten/publications/Brants-ANLP00.pdf
A Maximum Entropy Approach to Natural Language Processing
- http://www.cs.columbia.edu/~jebara/6772/papers/maxent.pdf
A Maximum Entropy Model for Part-Of-Speech Tagging (Ratnaparkhi)
- http://www.aclweb.org/anthology/W/W96/W96-0213.pdf
Baselines and Bigrams: Simple, Good Sentiment and Topic Classification (Wang, Manning) (NBSVM)
- http://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf
- https://rawgit.com/dpressel/Meetups/master/nlp-meetup-2016-02-25/presentation.html#34
Unsupervised Word Sense Disambiguation Rivaling Supervised Methods (Yarowksy)
- http://www.aclweb.org/anthology/P/P95/P95-1026.pdf
Training Deterministic Parsers with Non-Deterministic Oracles (Goldberg, Nivre)
- http://aclweb.org/anthology/Q/Q13/Q13-1033.pdf
A Dynamic Oracle for Arc-Eager Dependency Parsing (Goldberg, Nivre)
- http://www.aclweb.org/anthology/C/C12/C12-1059.pdf
TextRank: Bringing Order into Texts (Mihalcea, Tarau)
- https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
Improving Machine Learning Approaches to Coreference Resolution (Ng, Cardie)
- http://www.hlt.utdallas.edu/~vince/papers/acl02.pdf
Local and Global Algorithms for Disambiguation to Wikipedia (Ratinov, Roth, Downey, Anderson)
- http://www.aclweb.org/anthology/P11-1138.pdf
Latent Dirichlet Allocation (Blei, Ng, Jordan)
- http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

Papers for Deep Learning (mostly for NLP)

Representations, Cross-task

Distributed Representations of Words and Phrases and their Compositionality (Mikolov, Sutskever, Chen, Corrado, Dean)
- https://arxiv.org/abs/1310.4546
Exploiting Similarities among Languages for Machine Translation (Mikolov, Le, Sutskever)
- https://arxiv.org/abs/1309.4168
Efficient Estimation of Word Representations in Vector Space (Mikolov, Chen, Corrado, Dean)
- https://arxiv.org/abs/1301.3781
Deep contextualized word representations (Peters et al)
- https://export.arxiv.org/pdf/1802.05365
Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation (Ling et al)
- https://arxiv.org/pdf/1508.02096.pdf
Natural Language Processing (Almost) from Scratch (Collobert et al)
- http://jmlr.org/papers/volume12/collobert11a/collobert11a.pdf
Enriching Word Vectors with Subword Information (Bojanowski, Grave, Joulin, Mikolov)
- https://arxiv.org/abs/1607.04606

Language Modeling

Recurrent Neural Network Regularization (Zaremba, Sutskever, Vinyals)
- https://arxiv.org/abs/1409.2329
Character-Aware Neural Language Models (Kim, Jernite, Sontag, Rush)
- https://arxiv.org/abs/1508.06615
Exploring the Limits of Language Modeling (Jozefowicz, Vinyals, Schuster, Shazeer, Wu)
- https://arxiv.org/pdf/1602.02410v2.pdf

Sequence Tagging

Learning Character-level Representations for Part-of-Speech Tagging (dos Santos, Zadrozny)
- http://proceedings.mlr.press/v32/santos14.pdf
- https://rawgit.com/dpressel/Meetups/master/nlp-reading-group-2016-03-14/presentation.html#1
Boosting Named Entity Recognition with Neural Character Embeddings (dos Santos, Cıcero and Victor Guimaraes)
- http://www.aclweb.org/anthology/W15-3904
- https://rawgit.com/dpressel/Meetups/master/nlp-reading-group-2016-03-14/presentation.html#1
Neural Architectures for Named Entity Recognition (Lample et al)
- https://arxiv.org/abs/1603.01360
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (Ma, Hovy)
- https://arxiv.org/abs/1603.01354
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging (Reimers, Gurevych)
- http://aclweb.org/anthology/D17-1035

Encoder-Decoders, NMT

Sequence to Sequence Learning with Neural Networks (Sutskever, Vinyals, Le)
- https://arxiv.org/abs/1409.3215
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (Cho et al)
- https://arxiv.org/abs/1406.1078
Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau, Cho, Bengio)
- https://arxiv.org/abs/1409.0473
Attention Is All You Need (Vaswani et al)
- https://arxiv.org/pdf/1706.03762.pdf
Show and Tell: A Neural Image Caption Generator (Vinyals, Tosheb, Bengio, Erhan)
- https://arxiv.org/pdf/1411.4555v2.pdf

Dialogue

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models (Serban, Sordoni, Bengio, Courville, Pineau)
- https://arxiv.org/pdf/1507.04808.pdf
End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning (Williams, Zweig)
- https://arxiv.org/pdf/1606.01269.pdf
Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning (Williams, Asadi, Zweig)
- https://arxiv.org/pdf/1702.03274.pdf
Learning End-to-End Goal-oriented Dialog (Bordes, Boureau, Weston)
- https://arxiv.org/pdf/1605.07683.pdf
- https://rawgit.com/dpressel/Meetups/master/nlp-reading-group-2017-02-08/presentation.html#(1)
A Neural Conversation Model (Vinyals, Le)
- https://arxiv.org/pdf/1506.05869v3.pdf

Classification, Architecture, ML

Convolutional Neural Networks for Sentence Classification (Kim)
- https://arxiv.org/abs/1408.5882
Rethinking the Inception Architecture for Computer Vision (Szegedy)
- https://arxiv.org/abs/1512.00567
Going Deeper with Convolutions (Szegedy et al)
- https://arxiv.org/abs/1409.4842
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (Ioffe/Szegedy)
- https://arxiv.org/abs/1502.03167
Hierarchical Attention Networks for Document Classification (Yanh et al)
- https://www.microsoft.com/en-us/research/publication/hierarchical-attention-networks-document-classification/
Deep Residual Learning for Image Recognition (He, Zhang, Ren, Sun)
- https://arxiv.org/pdf/1512.03385v1.pdf

Videos/Courses/Learning

Hugo LaRochelle's Neural Networks course
- https://www.youtube.com/watch?v=SGZ6BttHMPw&list=PL6Xpj9I5qXYEcOhn7TqghAJ6NAPrNmUBH
Coursera
- Hinton course
- Ng course
Andrew Gibiansky's Blog
- http://andrew.gibiansky.com/blog/machine-learning/fully-connected-neural-networks/
Understanding LSTM Networks
- https://colah.github.io/posts/2015-08-Understanding-LSTMs/
The Unreasonable Effectiveness of Recurrent Neural Networks (Karpathy)
- https://karpathy.github.io/2015/05/21/rnn-effectiveness/

tensorspace/lit

README

Books

I strongly recommend reading at least the first book

Papers for NLP

Papers for Deep Learning (mostly for NLP)

Representations, Cross-task

Language Modeling

Sequence Tagging

Encoder-Decoders, NMT

Dialogue

Classification, Architecture, ML

Videos/Courses/Learning