🐋 Deep RL in TensorFlow2
This repository uses TensorFlow2 to implement a variety of popular Reinforcement Learning algorithms. We've used the environments in OpenAI gym and our goal is to continuously update them to implement all of the algorithms specified in OpenAI Spinning Up.
ENV | Reward Plot |
---|---|
CartPole-v1 |
Algorithms
DQN
Name | Deep Q-Learning |
---|---|
Paper | Playing Atari with Deep Reinforcement Learning |
Author | Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller |
Method | Temporal Diffrence / Off-Policy |
Action | Discrete |
DRQN
Name | Deep Recurrent Q-Learning |
---|---|
Paper | Deep Recurrent Q-Learning for Partially Observable MDPs |
Author | Matthew Hausknecht, Peter Stone |
Method | Temporal Diffrence / Off-Policy |
Action | Discrete |
A2C
Name | Advantage Actor-Critic |
---|---|
Paper | Actor-Critic Algorithms |
Author | Vijay R. Konda, John N. Tsitsiklis |
Method | Temporal Diffrence / On-Policy |
Action | Discrete / Continuous |
A3C
Name | Asyncronous Advantage Actor-Critic |
---|---|
Paper | Asynchronous Methods for Deep Reinforcement Learning |
Author | Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu |
Method | Temporal Diffrence / On-Policy |
Action | Discrete / Continuous |
PPO
Name | Proximal Policy Optimization |
---|---|
Paper | Proximal Policy Optimization |
Author | John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov |
Method | Temporal Diffrence / On-Policy |
Action | Discrete / Continuous |
Comming Soon...
Usage
Discrete Action Space Asyncronous Advantage Actor-Critic
$ python A3C/a3c_discrete_action.py
Deep Q-Learning
$ python DQN/dqn_discrete_action.py
Continuous Action Space Proximal Policy Optimization
$ python PPO/ppo_continuous_action.py
Papers
- Asynchronous Methods for Deep Reinforcement Learning
- Proximal Policy Optimization Algorithms
- Trust Region Policy Optimization
- Playing Atari with Deep Reinforcement Learning
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Reference
- https://github.com/carpedm20/deep-rl-tensorflow
- https://github.com/Yeachan-Heo/Reinforcement-Learning-Book
- https://github.com/pasus/Reinforcement-Learning-Book
- https://github.com/vcadillog/PPO-Mario-Bros-Tensorflow-2
- https://spinningup.openai.com/en/latest/spinningup/keypapers.html
- https://github.com/seungeunrho/minimalRL
- https://github.com/openai/baselines
- https://github.com/anita-hu/TF2-RL