Deep Deterministic Policy Gradient
TensorFlow/PyTorch implementation of Deep Deterministic Policy Gradient (https://arxiv.org/pdf/1509.02971.pdf)(deep reinforcement learning algorithm) on OpenAI gym's inverted pendulum environment. The goal is to swing the pendulum up so it stays upright. (https://gym.openai.com/envs/Pendulum-v0/)
Prerequisites:
To run the code, you need to have installed the following libraries/softwares on your system (preferably Ubuntu or any linux distro):
- python: Required version >= 3.5. Also, install pip using
sudo apt install python3-pip
. (if your package manager is apt) - TensorFlow: Recommeded to install via pip. https://www.tensorflow.org/install/pip
- PyTorch: Recommended to install via pip. https://pytorch.org/
- numpy:
pip install numpy
- jupyter:
pip install jupyter
- matplotlib:
pip install matplotlib
- seaborn:
pip install seaborn
- IPython:
sudo apt install python3-ipython
- tqdm:
pip install tqdm
- OpenAI gym: https://gym.openai.com/docs/
It is recommended to run the code in a virtualenv.
Running the code:
Install the required softwares and clone this repo. To test the code or perform experiments run a new jupyter session using
jupyter notebook
on terminal which launches the jupyter notebook app in a browser. In the notebook dashboard, navigate to find the notebook named pendulum
and run it.
To train/test the model, execute
python pendulum.py
Organization:
- src/gym_utils.py: Some utility functions to get parameters of the gym environment used, e.g. number of states and actions.
- src/model.py: Deep learning network for the agent.
- src/replay_buffer.py: A replay buffer to store state-action transitions and then randomly sample from it.
- src/stochastic_process.py: Function simulating Ornstein Ohlenbeck (OU) process, added as noise to the selected action.
- pendulum.ipynb: DDPG implementation in a jupyter notebook for testing the code and performing experiments.
- pendulum.py: Implementation of the algorithm for training and testing on the task of inverted pendulum (default).
- param_search.py: Code to randomly search for the best parameters for OU process.
- scatter.py Code to plot the final results while changing the parameters for OU process.
The repo is still under construction. To report bugs or add changes, open a pull request.