Reinforcement Learning: An Introduction

Python implementation for Sutton & Barto's Reinforcement Learning: An Introduction (2nd Edition)

Declare: Most of codes are modified from ShangtongZhang, but rewrite the codes to make it easy to understand. I not only write the codes for figures, but also complete some exercises in the book.

Chapter 2 Multi-armed Bandits

Figure 2.1: An example bandit problem from the 10-armed testbed.
Figure 2.2: Average performance of epsilon-greedy action-value methods on the 10-armed testbed.
Figure 2.3: The effect of optimistic initial action-value estimates on the 10-armed testbed.
Figure 2.4: Average performance of UCB action selection on the 10-armed testbed.
Figure 2.5: Average performance of the gradient bandit algorithm.
Figure 2.6: A parameter study of the various bandit algorithms.
Exercise 2.5
Exercise 2.11

Chapter 3 Finite Markov Decision Processes

Figure 3.2: Gridworld example.
Figure 3.5: Optimal solutions to the gridworld example.

Environment

Python 3.6
numpy
matplotlib
tqdm
seaborn

Reference

Github-ShangtongZhang/reinforcement-learning-an-introduction

Feel free to discuss with me if you have any questions !【Homepage: http://guohai.tech Email: xuguohai7@163.com】

airship-explorer/reinforcement-learning-an-introduction

Reinforcement Learning: An Introduction

Contents

Chapter 2 Multi-armed Bandits

Chapter 3 Finite Markov Decision Processes

Environment

Reference