Reinforcement Learning

This repository contains minimalistic implementations of several (Deep) Reinforcement Learning algorithms using PyTorch & TensorFlow2. The repository is constantly being updated and new algorithms will be added.

Algorithms

Implemented

DDPG
TD3
SAC

Planned

MPO
Hybrid-MPO
WD3 / AWD3

Quickstart

Install package via pip:

pip install git+https://github.com/jspieler/reinforcement-learning.git

Run algorithms for OpenAI gym environments, e.g. DDPG on the Pendulum-v1 environment for 150 episodes using PyTorch:
```
python rl_algorithms/PyTorch/train_agent.py --agent DDPG --env Pendulum-v1 --seed 1234 --ep 150
```
If you want to use custom parameters for the algorithm instead of the default one, you can add the argument --config /path/to/config.yaml. See config.yaml for an example.

Alternatively, here is a quick example of how to train DDPG on the Pendulum-v1 environment using PyTorch:

import gym 

from rl_algorithms.PyTorch.agents import DDPG
from rl_algorithms.PyTorch.train_agent import set_seeds, train

env = gym.make("Pendulum-v1")
agent = DDPG(env)
set_seeds(env, seed=1234)
train(agent, env, num_episodes=150, filename="ddpg_pendulum_v1_rewards.png")

Further information

Deep Deterministic Policy Gradient (DDPG)

Paper: Continuous control with deep reinforcement learning
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2

Note: Implementation is not exactly the same as described in the original paper since specific implementation details are not included (actions are already included in the first layer of the critic network, different weight initialization, no batch normalization, etc.).

Twin-Delayed Deep Deterministic Policy Gradient (TD3)

Paper: Addressing Function Approximation Error in Actor-Critic Methods
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2

Soft Actor-Critic (SAC)

Paper: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor / Soft Actor-Critic Algorithms and Applications
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2

Maximum a Posteriori Policy Optimisation (MPO)

Paper: Maximum a Posteriori Policy Optimisation
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous & Discrete
Implementation: not yet implemented

Hybrid Maximum a Posteriori Policy Optimization (Hybrid-MPO)

Paper: Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous & Discrete
Implementation: not yet implemented