Here I am implementing various RL algorithms, using python 2.7. I will use keras for the neurals nets. I'm going to use the OpenAI gym to test the algorithms. I list the methods below, which roughly divide into two categories.
I took / adjusted code from various online sources, which I inexhaustively list below (and in the code itself).
- Q-learning (tabular)
- Deep Q-Network (DQN)
- Double DQN (DDQN)
- DQN with prioritised replay
- Dueling DQN (DDQN)
- Distributional bellman
- Policy gradient -- REINFORCE & with baseline.
- Actor critic (A2C)
- Deep Deterministic Policy Gradient (DDPG)
- Proximal policy optimization (PPO)
- Soft Actor-Critic (soft AC)
- Muti-agent deep deterministic policy gradient (MADDPG)
- Actor-Attention-Critic (AAC)
- Value Decompostion Networks (VDN)
- QMIX
- Explore-and-go
- Curiosity driven learning (CDL)
- Rainbow (RB)