/TF2-RL

Reinforcement learning algorithms implemented for Tensorflow 2.0+ [DQN, DDPG, AE-DDPG, SAC, PPO, Primal-Dual DDPG]

Primary LanguagePython

Reinforcement Learning Agents

Implemented for Tensorflow 2.0+

New Updates!

  • DDPG with prioritized replay
  • Primal-Dual DDPG for CMDP

Future Plans

  • SAC Discrete

Usage

  • Install dependancies imported (my tf2 conda env as reference)
  • Each file contains example code that runs training on CartPole env
  • Training: python3 TF2_DDPG_LSTM.py
  • Tensorboard: tensorboard --logdir=DDPG/logs

Hyperparameter tuning

Agents

Agents tested using CartPole env.

Name On/off policy Model Action space support
DQN off-policy Dense, LSTM discrete
DDPG off-policy Dense, LSTM discrete, continuous
AE-DDPG off-policy Dense discrete, continuous
SAC:bug: off-policy Dense continuous
PPO on-policy Dense discrete, continuous

Contrained MDP

Name On/off policy Model Action space support
Primal-Dual DDPG off-policy Dense discrete, continuous

Models

Models used to generate the demos are included in the repo, you can also find q value, reward and/or loss graphs

Demos

DQN Basic, time step = 4, 500 reward DQN LSTM, time step = 4, 500 reward
DDPG Basic, 500 reward DDPG LSTM, time step = 5, 500 reward
AE-DDPG Basic, 500 reward PPO Basic, 500 reward