Book

Paper

Review Papers

A Primer on Neural Network Models for Natural Language Processing (Y.Goldberg, 2015.10)
A Critical Review of Recurrent Neural Networks for Sequence Learning (ZC.Lipton, 2015.05)

Various kinds of Deep Learning Models

Neural machine translation by jointly learning to align and translate (D.Bahdanau, ICLR, 2014)
Sequence to Sequence Learning with Neural Networks (I.Sutskever, NIPS, 2014)
Going Deeper with Convolutions (C.Szegedy, 2014)
Playing Atari with Deep Reinforcement Learning (V.Mnih, 2013)
Generating Text with Recurrent Neural Networks (I.Sutskever, 2011)

Understanding Deep Learning Models

Probabilistic Graphical Models

Application of Deep Belief Networks for Natural Language Understanding (R.Sarikaya, 2014)
Latent Dirichlet Allocation (MD.Blei, JMLR, 2003)

Deep Learning with Memory(Attention)

Attention Is All You Need (A.Vaswani, NIPS 2017)
Ask Me Anything: Dynamic Memory Networks for Natural Language Processing (A.Kumar, ICML 2016)
Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems (C.Raffel, ICLR(workshop) 2016)
Neural Machine Translation by Jointly Learning to Align and Translate (D.Bahdanau, ICLR 2015)
End-To-End Memory Networks (S.Sukhbaatar, NIPS 2015)

Text Classification with Deep Learning Models

Distributed Representations for Language

Enriching Word Vectors with Subword Information (P.Bojanowski, 2016)
Glove: Global vectors for word representation (J.Pennington, 2014.10)
Distributed Representations of Sentences and Documents (QV.Le, 2014.05)
Distributed Representations of Words and Phrases and their Compositionality (T.Mikolov, 2013.10)
Efficient Estimation of Word Representations in Vector Space (T.Mikolov, 2013.09)
Linguistic Regularities in Continuous Space Word Representations (T.Mikolov, 2013.06)
Recurrent Neural Network based Language Model (T.Mikolov, 2010)
A Neural Probabilistic Language Model (Y.Bengio, 2003)
Indexing by Latent Semantic Analysis (S.Deerwester, 1990)

Named Entity Recognition (CoNLL2003)

Named Entity Recognition with stack residual LSTM and trainable bias decoding (Q.Tran, 2017.07) (91.69)
Semi-supervised Sequence Tagging with Bidirectional Language Models (ME.Peters, 2017.04) (91.93)
Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks (Z.Yang, 2017.03) (91.26)
End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF (X.Ma and E.Hovy, 2016.06) (91.21)
Neural Architectures for Named Entity Recognition (G.Lample, 2016.04) (90.94)
Named Entity Recognition with bidirectional LSTM-CNNs (JPC.Chiu, 2015.11) (91.62)
Natural Language Processing (almost) from Scratch (R.Collobert, 2011)
A Survey of Named Entity Recognition and Classification (2007.01)

ETC.

Calculus on Computational Graphs: Backpropagation (Aug 31, 2015, Colah)
Understanding LSTM Networks (Aug 27, 2015, Colah)
Deep Learning and NLP and Representations (July 7, 2014, Colah)
Attention and Memory in Deep Learning and NLP (Jan 3, 2016, WILDML)
Understanding Convolutional Neural Networks for NLP (Nov 7, 2015, WILDML)
프로그래머를 위한 알파고 (Mar 13, 2016, Slideshare)