/reinforcement-learning

Implementation of (Deep) Reinforcement Learning algorithms using PyTorch & TensorFlow2

Primary LanguagePythonMIT LicenseMIT

Reinforcement Learning

This repository contains minimalistic implementations of several (Deep) Reinforcement Learning algorithms using PyTorch & TensorFlow2. The repository is constantly being updated and new algorithms will be added.

Algorithms

Implemented

Planned

Quickstart

  1. Install package via pip:

    pip install git+https://github.com/jspieler/reinforcement-learning.git
    
  2. Run algorithms for OpenAI gym environments, e.g. DDPG on the Pendulum-v1 environment for 150 episodes using PyTorch:

    python rl_algorithms/PyTorch/train_agent.py --agent DDPG --env Pendulum-v1 --seed 1234 --ep 150
    

    If you want to use custom parameters for the algorithm instead of the default one, you can add the argument --config /path/to/config.yaml. See config.yaml for an example.

  3. Alternatively, here is a quick example of how to train DDPG on the Pendulum-v1 environment using PyTorch:

    import gym 
    
    from rl_algorithms.PyTorch.agents import DDPG
    from rl_algorithms.PyTorch.train_agent import set_seeds, train
    
    env = gym.make("Pendulum-v1")
    agent = DDPG(env)
    set_seeds(env, seed=1234)
    train(agent, env, num_episodes=150, filename="ddpg_pendulum_v1_rewards.png")
    

Further information

Deep Deterministic Policy Gradient (DDPG)

Paper: Continuous control with deep reinforcement learning
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2

Note: Implementation is not exactly the same as described in the original paper since specific implementation details are not included (actions are already included in the first layer of the critic network, different weight initialization, no batch normalization, etc.).


Twin-Delayed Deep Deterministic Policy Gradient (TD3)

Paper: Addressing Function Approximation Error in Actor-Critic Methods
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2


Soft Actor-Critic (SAC)

Paper: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor / Soft Actor-Critic Algorithms and Applications
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous
Implementation: PyTorch / TensorFlow2


Maximum a Posteriori Policy Optimisation (MPO)

Paper: Maximum a Posteriori Policy Optimisation
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous & Discrete
Implementation: not yet implemented


Hybrid Maximum a Posteriori Policy Optimization (Hybrid-MPO)

Paper: Continuous-Discrete Reinforcement Learning for Hybrid Control in Robotics
Method: Off-Policy / Temporal-Difference / Actor-Critic / Model-Free
Action space: Continuous & Discrete
Implementation: not yet implemented