My Reinforcement Learning Implementations

Currently focused on:

DQN (and DDQN) REINFORCE with baseline PPO TRPO

I wanted to start with the easier/basic methods (as described in Reinforcement Learning: an Introduction). One issue I've run into, although I'm probably missing something, is that (at least for value-functions) full representations of the state space are required. It's not 100% clear to me how to represent that simply while taking advantage of the provided environments in openai/gym. My plan for now is to start with the above three methods, and then I'll write some specific environments and representations for the other methods.

Tabular Methods:

Rollout
Monte-Carlo Tree Search

Control Goals

n-step SARSA on-policy (probably start with 1-step)
n-step SARSA off-policy
n-step Tree Backup
Q-learning
n-step Q(sigma)

TD(lambda)

On-Policy Estimation with Approximation

Gradient Monte-Carlo
Semi-Gradient TD(0)

On-Policy Control with Approximation

n-step semi-gradient SARSA
n-step differential semi-gradient SARSA

Off-Policy Control with Approximation

DQN (paper: https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) -- DDQN (paper: https://arxiv.org/pdf/1509.06461.pdf)
GTD(0)
Semi-Gradient TD(lambda)

Policy Optimization Goals:

REINFORCE
REINFOCE with Baseline
PPO (paper: https://arxiv.org/abs/1707.06347)
TRPO (paper: https://arxiv.org/abs/1502.05477)

Extra

GoZero (paper: https://www.nature.com/articles/nature24270)

ifestus/rl

My Reinforcement Learning Implementations

Currently focused on:

Tabular Methods:

Control Goals

TD(lambda)

On-Policy Estimation with Approximation

On-Policy Control with Approximation

Off-Policy Control with Approximation

Policy Optimization Goals:

Extra

Unless explicitly stated, these algorithms are my implementations of the material in `Reinforcement Learning: an Introduction` by Sutton and Barto

ifestus/rl

My Reinforcement Learning Implementations

Currently focused on:

Tabular Methods:

Control Goals

TD(lambda)

On-Policy Estimation with Approximation

On-Policy Control with Approximation

Off-Policy Control with Approximation

Policy Optimization Goals:

Extra

Unless explicitly stated, these algorithms are my implementations of the material in Reinforcement Learning: an Introduction by Sutton and Barto

Unless explicitly stated, these algorithms are my implementations of the material in `Reinforcement Learning: an Introduction` by Sutton and Barto