Lectures, Books, Surveys and Thesis of Reinforcement Learning
An Outsider’s Tour of Reinforcement Learning
Now working on Chapter7 Eligibility Traces of Reinforcement Learning: An Introduction
- Images and colored texts cannot show correctly on Github. Please copy the above link into nbviewer to get a correct view.
NxN GridWorld Code (Only contain one-step policy evaluation)
NxN GridWorld by Policy Iteration
Taxi v3 problem by SARSA and Q-Learning (Temporal Difference)
CarRental Policy Iteration (unfinished)
- The study note of Reinforcement Learning: An Introduction. Contents in .md and .ipynb are the same.
N-Armed-Bandit.ipynb now has included the entire algorithms of this interesting problem.
- 4 action selelcting algorithms: epsilon-greedy, softmax, upper bound confidence (UCB) and gradient ascent (preference estimation).
- 2 data generation methods: stationary and nonstationary.
- 2 initial value setup methods: add baseline and setup burning period.
Future works on this script will focus on optimizing the performance and correcting potential bugs.
Solutions by Gym
will be added later.
- Size of the GridWorld can be changed at will. To get the same result as Reinforcement Learning: An Introduction, change n=4.
- The GridWorld_DP.ipynb only contains the policy evaluation.
- The GridWorld_by_PolicyIteration.ipynb contains completer policy iteration procedure. Value iteration is a special case in policy iteration, which can be adapted by the code.
Future works will focus on optimizing the efficiency and adding visualization.
Solutions by Gym
will be added later.
- The BlackJack problem is solved by Monte Carlo Method.
- Has finished the policy evaluation part. Currently working on policy improvement.
- Solutions by
Gym
will be added later.
- The Taxi_v3 problem is solved by Temporal Difference Method
- The code contains on-policy method SARSA and off-policy method Q-Learning
- Apply
Gym
API to make the environment