visalvo/RL-practice-keras

Jupyter Notebook

Deep Reinforcement Learning

Here I am implementing various RL algorithms, using python 2.7. I will use keras for the neurals nets. I'm going to use the OpenAI gym to test the algorithms. I list the methods below, which roughly divide into two categories.

I took / adjusted code from various online sources, which I inexhaustively list below (and in the code itself).

Value based methods

Policy based methods

Policy gradient -- REINFORCE & with baseline.
Actor critic (A2C)
Deep Deterministic Policy Gradient (DDPG)
Proximal policy optimization (PPO)
Soft Actor-Critic (soft AC)

Multi-agent

Muti-agent deep deterministic policy gradient (MADDPG)
Actor-Attention-Critic (AAC)
Value Decompostion Networks (VDN)
QMIX

Others

Explore-and-go
Curiosity driven learning (CDL)
Rainbow (RB)

Resources

Papers

Blogs

Textbooks

Sutton

Acknowledgements