/DDPG

Deep Deterministic Policy Gradient (Deep RL algorithm)

Primary LanguageJupyter Notebook

Deep Deterministic Policy Gradient

TensorFlow/PyTorch implementation of Deep Deterministic Policy Gradient (https://arxiv.org/pdf/1509.02971.pdf)(deep reinforcement learning algorithm) on OpenAI gym's inverted pendulum environment. The goal is to swing the pendulum up so it stays upright. (https://gym.openai.com/envs/Pendulum-v0/)

Prerequisites:

To run the code, you need to have installed the following libraries/softwares on your system (preferably Ubuntu or any linux distro):

  • python: Required version >= 3.5. Also, install pip using sudo apt install python3-pip. (if your package manager is apt)
  • TensorFlow: Recommeded to install via pip. https://www.tensorflow.org/install/pip
  • PyTorch: Recommended to install via pip. https://pytorch.org/
  • numpy: pip install numpy
  • jupyter: pip install jupyter
  • matplotlib: pip install matplotlib
  • seaborn: pip install seaborn
  • IPython: sudo apt install python3-ipython
  • tqdm: pip install tqdm
  • OpenAI gym: https://gym.openai.com/docs/

It is recommended to run the code in a virtualenv.

Running the code:

Install the required softwares and clone this repo. To test the code or perform experiments run a new jupyter session using

jupyter notebook

on terminal which launches the jupyter notebook app in a browser. In the notebook dashboard, navigate to find the notebook named pendulum and run it. To train/test the model, execute

python pendulum.py

Organization:

  • src/gym_utils.py: Some utility functions to get parameters of the gym environment used, e.g. number of states and actions.
  • src/model.py: Deep learning network for the agent.
  • src/replay_buffer.py: A replay buffer to store state-action transitions and then randomly sample from it.
  • src/stochastic_process.py: Function simulating Ornstein Ohlenbeck (OU) process, added as noise to the selected action.
  • pendulum.ipynb: DDPG implementation in a jupyter notebook for testing the code and performing experiments.
  • pendulum.py: Implementation of the algorithm for training and testing on the task of inverted pendulum (default).
  • param_search.py: Code to randomly search for the best parameters for OU process.
  • scatter.py Code to plot the final results while changing the parameters for OU process.

The repo is still under construction. To report bugs or add changes, open a pull request.