- Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism [arXiv]
- Recurrent Memory Network for Language Modeling [arXiv]
- Language to Logical Form with Neural Attention [arXiv]
- Learning to Compose Neural Networks for Question Answering [arXiv]
- The Inevitability of Probability: Probabilistic Inference in Generic Neural Networks Trained with Non-Probabilistic Feedback [arXiv]
NLP
- Strategies for Training Large Vocabulary Neural Language Models [arXiv]
- Multilingual Language Processing From Bytes [arXiv]
- Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews [arXiv]
- Target-Dependent Sentiment Classification with Long Short Term Memory [arXiv]
Vision
- Deep Residual Learning for Image Recognition [arXiv]
- Rethinking the Inception Architecture for Computer Vision [arXiv]
- Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [arXiv]
- Deep Speech 2: End-to-End Speech Recognition in English and Mandarin [arXiv]
NLP
- Teaching Machines to Read and Comprehend [arxiv]
- Semi-supervised Sequence Learning [arXiv]
- Multi-task Sequence to Sequence Learning [arXiv]
- Alternative structures for character-level RNNs [arXiv]
- Larger-Context Language Modeling [arXiv]
- A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding [arXiv]
- Towards Universal Paraphrastic Sentence Embeddings [arXiv]
- BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies [arXiv]
- Sequence Level Training with Recurrent Neural Networks [arXiv]
- Natural Language Understanding with Distributed Representation [arXiv]
- sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings [arXiv]
- LSTM-based Deep Learning Models for non-factoid answer selection [arXiv]
Programs
- Neural Random-Access Machines [arxiv]
- Neural Programmer: Inducing Latent Programs with Gradient Descent [arXiv]
- Neural Programmer-Interpreters [arXiv]
- Learning Simple Algorithms from Examples [arXiv]
- Neural GPUs Learn Algorithms [arXiv]
- On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models [arXiv]
Vision
- ReSeg: A Recurrent Neural Network for Object Segmentation [arXiv]
- Deconstructing the Ladder Network Architecture [arXiv]
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [arXiv]
General
- Towards Principled Unsupervised Learning [arXiv]
- Dynamic Capacity Networks [arXiv]
- Generating Sentences from a Continuous Space [arXiv]
- Net2Net: Accelerating Learning via Knowledge Transfer [arXiv]
- A Roadmap towards Machine Intelligence [arXiv]
- Session-based Recommendations with Recurrent Neural Networks [arXiv]
- Regularizing RNNs by Stabilizing Activations [arXiv]
- A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification [arXiv]
- Attention with Intention for a Neural Network Conversation Model [arXiv]
- Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network [arXiv]
- A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas [arXiv]
- A Primer on Neural Network Models for Natural Language Processing [arXiv]
- Character-level Convolutional Networks for Text Classification [arXiv]
- A Neural Attention Model for Abstractive Sentence Summarization [arXiv]
- Listen, Attend and Spell [arxiv]
- Character-Aware Neural Language Models [arXiv]
- Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs [arXiv]
- Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation [arXiv]
- A Neural Network Approach to Context-Sensitive Generation of Conversational Responses [arXiv]
- Document Embedding with Paragraph Vectors [arXiv]
- A Neural Conversational Model [arXiv]
- Skip-Thought Vectors [arXiv]
- Pointer Networks [arXiv]
- Spatial Transformer Networks [arXiv]
- Tree-structured composition in neural networks without tree-structured architectures [arXiv]
- Visualizing and Understanding Neural Models in NLP [arXiv]
- Learning to Transduce with Unbounded Memory [arXiv]
- Ask Me Anything: Dynamic Memory Networks for Natural Language Processing [arXiv]
- Deep Knowledge Tracing [arXiv]
- ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks [arXiv]
- Reinforcement Learning Neural Turing Machines [arXiv]
- Correlational Neural Networks [arXiv]
- Distilling the Knowledge in a Neural Network [arXiv]
- End-To-End Memory Networks [arXiv]
- Neural Responding Machine for Short-Text Conversation [arXiv]
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [arXiv]
- Text Understanding from Scratch [arXiv]
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention [arXiv]
- Neural Turing Machines [arxiv]
- Grammar as a Foreign Langauage [arXiv]
- On Using Very Large Target Vocabulary for Neural Machine Translation [arXiv]
- Effective Use of Word Order for Text Categorization with Convolutional Neural Networks [arXiv]
- Multiple Object Recognition with Visual Attention [arXiv]
- Sequence to Sequence Learning with Neural Networks [arXiv]
- Neural Machine Translation by Jointly Learning to Align and Translate [arxiv]
- On the Properties of Neural Machine Translation: Encoder-Decoder Approaches [arXiv]
- Recurrent Neural Network Regularization [arXiv]
- Very Deep Convolutional Networks for Large-Scale Image Recognition [arXiv]
- Going Deeper with Convolutions [arXiv]
- Convolutional Neural Networks for Sentence Classification [arxiv]
- Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation [arXiv]
- Recurrent Models of Visual Attention [arXiv]
- Generative Adversarial Networks [arXiv]
- A Convolutional Neural Network for Modelling Sentences [arXiv]
- Visualizing and Understanding Convolutional Networks [arXiv]
- DeViSE: A Deep Visual-Semantic Embedding Model [pub]
- Maxout Networks [arXiv]
- Exploiting Similarities among Languages for Machine Translation [arXiv]
- Efficient Estimation of Word Representations in Vector Space [arXiv]
- Natural Language Processing (almost) from Scratch [arXiv]