A list of papers and resources dedicated to deep reinforcement learning.
Please note that this list is currently work-in-progress and far from complete.
- Add more and more papers
- Improve the way of classifying papers (tags may be useful)
- Create a policy of this list: curated or comprehensive, how to define "deep reinforcement learning", etc.
If you want to inform the maintainer of a new paper, feel free to contact @mooopan. Issues and PRs are also welcome.
- Deep Value Function
- Deep Policy
- Deep Actor-Critic
- Deep Model
- Application to Non-RL Tasks
- Unclassified
- S. Lange and M. Riedmiller, Deep Learning of Visual Control Policies, ESANN, 2010. pdf
- Deep Fitted Q-Iteration (DFQ)
- V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonglou, D. Wierstra, and M. Riedmiller, Playing Atari with Deep Reinforcement Learning, NIPS 2013 Deep Learning Workshop, 2013. pdf
- Deep Q-Network (DQN) with experience replay
- V. Mnih, K. Kavukcuoglu, D. Silver, A. a Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, Human-level control through deep reinforcement learning, Nature, 2015. pdf code
- Deep Q-Network (DQN) with experience replay and target network
- T. Schaul, D. Horgan, K. Gregor, and D. Silver, Universal Value Function Approximators, ICML, 2015. pdf
- A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, M. Suleyman, C. Beattie, S. Petersen, S. Legg, V. Mnih, and D. Silver, Massively Parallel Methods for Deep Reinforcement Learning, ICML Deep Learning Workshop, 2015. pdf
- Gorila (General Reinforcement Learning Architecture)
- K. Narasimhan, T. Kulkarni, and R. Barzilay, Language Understanding for Text-based Games Using Deep Reinforcement Learning, EMNLP, 2015. pdf supplementary code
- LSTM-DQN
- M. Hausknecht and P. Stone, Deep Recurrent Q-Learning for Partially Observable MDPs, arXiv, 2015. arXiv code
- M. Lai, Giraffe: Using Deep Reinforcement Learning to Play Chess, arXiv. 2015. arXiv code
- H. van Hasselt, A. Guez, and D. Silver, Deep reinforcement learning with double q-learning, arXiv, 2015. arXiv
- Double DQN
- S. Levine, C. Finn, T. Darrell, and P. Abbeel, End-to-End Training of Deep Visuomotor Policies, arXiv, 2015. arXiv
- partially observed guided policy search
- J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, Trust Region Policy Optimization, ICML, 2015. pdf
- J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, High-Dimensional Continuous Control Using Generalized Advantage Estimation, arXiv, 2015. arXiv
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, Continuous control with deep reinforcement learning, arXiv, 2015. arXiv
- D. Balduzzi and M. Ghifary, Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies, arXiv, 2015. arXiv
- N. Heess, G. Wayne, D. Silver, T. Lillicrap, Y. Tassa, and T. Erez, Learning Continuous Control Policies by Stochastic Value Gradients, NIPS, 2015. arXiv video
- B. C. Stadie, S. Levine, and P. Abbeel, Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models, arXiv, 2015. arXiv
- J. Oh, X. Guo, H. Lee, R. Lewis, and S. Singh, Action-Conditional Video Prediction using Deep Networks in Atari Games, NIPS, 2015. arXiv
- J. M. Assael, W. Om, T. B. Schön, and M. P. Deisenroth, Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models, arXiv, 2015 arXiv
- J. C. Caicedo and S. Lazebnik, Active Object Localization with Deep Reinforcement Learning, ICCV, 2015. pdf
- H. Guo, Generating Text with Deep Reinforcement Learning, arXiv, 2015. arXiv
- X. Guo, S. Singh, H. Lee, R. Lewis, and X. Wang, Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning, NIPS, 2014. pdf video
- S. Mohamed and D. J. Rezende, Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning, arXiv, 2015. arXiv