/reinforcement-learning

Various RL algorithms implementations

Primary LanguageJupyter Notebook

Overview

Reinforcement Learning algorithms implementations.

Table of contents

  • Multi-armed Bandits
    • Bandits environment implementation (k-armed w/ optional non-stationarity)
    • ε-greedy policy
    • Upper Confidence Bound policy
    • Policy gradient
  • Dynamic Programming
    • Policy Iteration (Policy Evaluation + Policy Improvement)
    • Value Iteration
  • Monte-Carlo
    • MC on-policy value function estimation
    • MC on-policy first-visit ε-greedy
    • MC off-policy every-visit w/ weighted important sampling
  • Temporal Difference
    • SARSA
    • Q-Learning
    • Expected SARSA
    • Double Q-Learning
  • n-step Bootstrapping (TODO)
  • Planning and Learning
    • Maze environment implementation
    • Dyna-Q
    • Dyna-Q w/ prioritized sweeping
    • Dyna-Q+

Prerequisites

  • Conda

Installation

Resources

Authors