/deep-reinforcement-learning-papers

A list of papers and resources dedicated to deep reinforcement learning

Deep Reinforcement Learning Papers

A list of papers and resources dedicated to deep reinforcement learning.

Please note that this list is currently work-in-progress and far from complete.

TODOs

  • Add more and more papers
  • Improve the way of classifying papers (tags may be useful)
  • Create a policy of this list: curated or comprehensive, how to define "deep reinforcement learning", etc.

Contributing

If you want to inform the maintainer of a new paper, feel free to contact @mooopan. Issues and PRs are also welcome.

Table of Contents

Papers

Deep Value Function

  • S. Lange and M. Riedmiller, Deep Learning of Visual Control Policies, ESANN, 2010. pdf
    • Deep Fitted Q-Iteration (DFQ)
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonglou, D. Wierstra, and M. Riedmiller, Playing Atari with Deep Reinforcement Learning, NIPS 2013 Deep Learning Workshop, 2013. pdf
    • Deep Q-Network (DQN) with experience replay
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. a Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, Human-level control through deep reinforcement learning, Nature, 2015. pdf code
    • Deep Q-Network (DQN) with experience replay and target network
  • T. Schaul, D. Horgan, K. Gregor, and D. Silver, Universal Value Function Approximators, ICML, 2015. pdf
  • A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, M. Suleyman, C. Beattie, S. Petersen, S. Legg, V. Mnih, and D. Silver, Massively Parallel Methods for Deep Reinforcement Learning, ICML Deep Learning Workshop, 2015. pdf
    • Gorila (General Reinforcement Learning Architecture)
  • K. Narasimhan, T. Kulkarni, and R. Barzilay, Language Understanding for Text-based Games Using Deep Reinforcement Learning, EMNLP, 2015. pdf supplementary code
    • LSTM-DQN
  • M. Hausknecht and P. Stone, Deep Recurrent Q-Learning for Partially Observable MDPs, arXiv, 2015. arXiv code
  • M. Lai, Giraffe: Using Deep Reinforcement Learning to Play Chess, arXiv. 2015. arXiv code
  • H. van Hasselt, A. Guez, and D. Silver, Deep reinforcement learning with double q-learning, arXiv, 2015. arXiv
    • Double DQN

Deep Policy

  • S. Levine, C. Finn, T. Darrell, and P. Abbeel, End-to-End Training of Deep Visuomotor Policies, arXiv, 2015. arXiv
    • partially observed guided policy search
  • J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, Trust Region Policy Optimization, ICML, 2015. pdf

Deep Actor-Critic

  • J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, High-Dimensional Continuous Control Using Generalized Advantage Estimation, arXiv, 2015. arXiv
  • T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, Continuous control with deep reinforcement learning, arXiv, 2015. arXiv
  • D. Balduzzi and M. Ghifary, Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies, arXiv, 2015. arXiv
  • N. Heess, G. Wayne, D. Silver, T. Lillicrap, Y. Tassa, and T. Erez, Learning Continuous Control Policies by Stochastic Value Gradients, NIPS, 2015. arXiv video

Deep Model

  • B. C. Stadie, S. Levine, and P. Abbeel, Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, arXiv, 2015. arXiv
  • J. Oh, X. Guo, H. Lee, R. Lewis, and S. Singh, Action-Conditional Video Prediction using Deep Networks in Atari Games, NIPS, 2015. arXiv
  • J. M. Assael, W. Om, T. B. Schön, and M. P. Deisenroth, Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models, arXiv, 2015 arXiv

Application to Non-RL Tasks

  • J. C. Caicedo and S. Lazebnik, Active Object Localization with Deep Reinforcement Learning, ICCV, 2015. pdf
  • H. Guo, Generating Text with Deep Reinforcement Learning, arXiv, 2015. arXiv

Unclassified

  • X. Guo, S. Singh, H. Lee, R. Lewis, and X. Wang, Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning, NIPS, 2014. pdf video
  • S. Mohamed and D. J. Rezende, Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning, arXiv, 2015. arXiv

Talks/Slides

  • S. Levine, Deep Learning for Decision Making and Control, 2015. video
  • D. Silver, Deep Reinforcement Learning, ICLR, 2015. video1 video2 slides
  • D. Silver, Deep Reinforcement Learning, UAI, 2015. video

Miscellaneous