Deep Deterministic Policy Gradient (DDPG) with Prioritized Experience Replay

Authors: Luca Iezzi and Giulia Ciabatti.

This consists of a complete reimplementation of DDPG with PrioritizedExperience Replay, and its adaptation on Pendulum-v1 and MountainCarContinuous-v0, from OpenAI Gym.

Agent playing

Usage

This implementation is based on Python 3.8 and PyTorch Lightning. To install all the requirements:

$ pip install -r requirements.txt

Hyperparameters

PER_BUFFER

ACTOR

CRITIC

hyperparameter	value
WARM_POPULATE	10000
ALPHA	0.6
BETA	0.1
PRIORITIZED_REPLAY_ALPHA size	0.6
PRIORITIZED_REPLAY_BETA0 size	0.4
PRIORITIZED_REPLAY_BETA_ITERS	None
PRIORITIZED_REPLAY_EPS rate	1e-6
BATCH_SIZE	64
Latent size	64
EPISODES	150

hyperparameter	value
OU_NOISE_STD	0.8
OPTIMIZER	Adam
LEARNING RATE	1e-4
GAMMA	0.99
TAU	5e-3

hyperparameter	value
OPTIMIZER	Adam
LEARNING RATE	5e-4
GAMMA	0.99
TAU	5e-3

Results

Running

The complete pipeline to train the 3 model components:

1. Train the agent

In simple_config.py, set ENV=[gym env you want to train on], set TRAIN=True and run:

$ python main.py

1. Test the agent

In main.py, manually copy the path of one of the checkpoints in ckpt/ in the variable model, set TRAIN=False and RENDER=True in simple_config.py and run:

$ python main.py

Credits

Some of the implementations (e.g. SumTree and MinTree) have been taken by existing repos, but it's been impossible to track the original author :( . If you recognize your code there, don't hesitate to drop me an email and I will add your repo to the credits!

Giulia1809/prioritized-experience-replay-ddpg