Reinforcement Learning: An Introduction

Python code for Sutton & Barto's book Reinforcement Learning: An Introduction (2nd Edition)

If you have any confusion about the code or want to report a bug, please open an issue instead of emailing me directly.

Click to view the sample output

Chapter 1

Tic-Tac-Toe

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Figure 7.2: Performance of n-step TD methods on 19-state random walk

Chapter 8

Chapter 9

Figure 9.1: Gradient Monte Carlo algorithm on the 1000-state random walk task
Figure 9.2: Semi-gradient n-steps TD algorithm on the 1000-state random walk task
Figure 9.5: Fourier basis vs polynomials on the 1000-state random walk task
Figure 9.8: Example of feature width’s effect on initial generalization and asymptotic accuracy
Figure 9.10: Single tiling and multiple tilings on the 1000-state random walk task

Chapter 10

Chapter 11

Chapter 12

Environment

Python2 or Python3
Numpy
Matplotlib
Six
Seaborn

Usage

git clone https://github.com/ShangtongZhang/reinforcement-learning-an-introduction.git
cd reinforcement-learning-an-introduction/chapterXX
python XXX.py

Contribution

This project contains almost all the programmable figures in the book. However, when I completed this project, the book is still in draft and some chapters are still incomplete. Furthermore, due to the limited computational capacity of my machine, I can only use limited runs and episodes for some experiments, so the sample output is much less smooth than that in the book.

If you want to contribute some exercises of the book or some missing examples, fix some bugs in existing code, provide sample outputs with higher quality, add some new interesting experiments related to RL, feel free to open an issue or make a pull request. I will appreciate it very much. Also, feel free to comment on the sample outputs, some curves are really interesting.

Following are known missing figures/examples:

Example 3.4: Pole-Balancing
Example 3.6: Draw Poker
Example 5.2: Soap Bubble
Example 8.5: Rod Maneuvering
Figure 12.14: The effect of λ (I don't have time to replicate it for now)
Chapter 14 & 15 are about psychology and neuroscience
Chapter 16: Backgammon, The Acrobot, Go

A Jupyter Notebook version is being developed by Kulbear now, completed chapters are available in the notebook branch.

mulinfro/reinforcement-learning-an-introduction