/Reinforcement-Learning-An-Introduction

Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition)

Primary LanguageKotlinMIT LicenseMIT

Reinforcement Learning: An Introduction

Kotlin implementation of algorithms, examples, and exercises from the Sutton and Barto: Reinforcement Learning (2nd Edition). The purpose of this project is to help understanding RL algorithms and experimenting easily.

Inspired by ShangtongZhang/reinforcement-learning-an-introduction (Python) and idsc-frazzoli/subare (Java 8)

Features:

  • Algorithms and problems are separated. So you can experiment with various combination of <algorithm, problem> or <algorithm,function approximator, problem>
  • Implementation is very close to the pseudo code in the book. So reading source code will help you understand the original algorithm.

Implemented algorithms:

Model-based (Dynamic Programming):

Monte Carlo (episode backup):

Temporal Difference (one-step backup):

n-step Temporal Difference (unify MC and TD):

Dyna (Integrate Planning, Acting, and Learning):

On-policy Prediction with Function Approximation

On-policy Control with Function Approximation

Off-policy Methods with Approximation

Eligibility Traces

Policy Gradient Methods

Implemented problems:

Build

Built with Maven

Test cases

Try Testcases

Figure 7.2

Figure 7.2: Performance of n-step TD methods as acc function of α, for various values of n, on acc 19-state random walk task


Figure 10.1

Figure 10.1: The Mountain Car task and the cost-to-go function learned during one run


Figure 10.4

Figure 10.4: Effect of the α and n on early performance of n-step semi-gradient Sarsa and tile-coding function approximation on the Mountain Car task


Figure 12.3

Figure 12.3: 19-state Random walk results: Performance of the offline λ-return algorithm .


Figure 12.6

Figure 12.6: 19-state Random walk results: Performance of TD(λ) .


Figure 12.8

Figure 12.8: 19-state Random walk results: Performance of online λ-return algorithms


Figure 12.10

Figure 12.10: Early performance on the Mountain Car task of Sarsa(λ) with replacing traces


Figure 12.11

Figure 12.11: Summary comparison of Sarsa(λ) algorithms on the Mountain Car task.