Soft Actor-Critic

PyTorch implementation of Soft Actor-Critic (https://arxiv.org/pdf/1801.01290.pdf) (deep reinforcement learning algorithm) tested on inverted pendulum swingup problem (OpenAI gym environment) which is a classic problem in control. The goal is to swing the pendulum up so it stays upright while it starts in a random position. (https://gym.openai.com/envs/Pendulum-v0/)

Prerequisites:

To run the code, you need to have installed the following libraries/softwares on your system (preferably Ubuntu or any linux distro):

python: Required version >= 3.5. Also, installing pip is useful: sudo apt install python3-pip. (if your package manager is apt)
PyTorch: Recommeded to install via pip. https://pytorch.org/
numpy: pip install numpy
jupyter: pip install jupyter
matplotlib: pip install matplotlib
seaborn: pip install seaborn
IPython: sudo apt install python3-ipython
tqdm: pip install tqdm
OpenAI gym: https://gym.openai.com/docs/

It is recommended to run the code in a virtualenv.

Running the code:

Install the required softwares and clone this repo. To test the code or perform experiments run a new jupyter session using

jupyter notebook

on terminal which launches the jupyter notebook app in a browser. In the notebook dashboard, navigate to find the notebook softac and run it. To train/test the model, execute

python softac.py

Organization:

gym_utils.py: Some utility functions to get parameters of the gym environment used, e.g. number of states and actions.
model.py: Deep learning network for the agent.
replay_buffer.py: A replay buffer to store state-action transitions and then randomly sample from it.
softac.ipynb: Soft Actor-Critc implementation in a jupyter notebook for testing the code and performing experiments.
softac.py: Implementation of the algorithm for training and testing on the task of inverted pendulum (default).

The repo is still under construction. To report bugs or add changes, open a pull request.

kushagra06/SAC

Soft Actor-Critic

Prerequisites:

Running the code:

Organization: