Implementation for Reinforcement Learning: An Introduction

This project provides a python implementation for all the figures and examples in the book - Reinforcement Learning: An Introduction (2nd Edition).

In the implementation process, some parameters are not mentioned in the book. For the consistency of the figure, these parameters refer to the code from ShangtongZhang.

Figure: the original (left) and implementation (right)

All codes are well organized and easy to understand, and can be easily read with formulas and algorithms:

# direct reinforcement learning
Q[S][A] += α * (R + γ * max(Q[S_]) - Q[S][A])

# model learning
self.t += 1
# Actions that had never been tried were allowed to be considered in the planning step
if κ != 0 and (S, A) not in M:
    for a in range(Maze.ACT_NUM):
        M[S, a] = (S, 0, 1)
M[S, A] = (S_, R, self.t)

# planning
for _ in range(self.n):
    S, A = random.choice(list(M.keys()))
    S_, R, t = M[S, A]
    if κ:
        τ = self.t - t
        R += κ * np.sqrt(τ)
    Q[S][A] += α * (R + γ * max(Q[S_]) - Q[S][A])

Urinx/RL_intro_code

Implementation for Reinforcement Learning: An Introduction