/DDPG_Agent

Deep Deterministic Policy Gradient (DDPG) is a model-free off-policy algorithm for learning continuous actions. It is a reinforcement learning technique that combines ideas from DPG (Deterministic Policy Gradient) and DQN (Deep Q-Network). From DQN, it uses Experience Replay and Slow-learning target networks. From DPG, it incorporates Operating over continuous action spaces.

Primary LanguageJupyter Notebook

Watchers