leafsigh/Reinforcement-Learning

Notes and projects are restored in this repo

Jupyter Notebook

Reinforcement Learning Resources

Lectures, Books, Surveys and Thesis of Reinforcement Learning

An Outsider’s Tour of Reinforcement Learning

Reinforcement Learning

强化学习从入门到放弃

OpenAI DeepRL Courses

Dynamic Programming Problems

Study Notes and Codes

Now working on Chapter7 Eligibility Traces of Reinforcement Learning: An Introduction

Images and colored texts cannot show correctly on Github. Please copy the above link into nbviewer to get a correct view.

N-Armed-Bandit Code

NxN GridWorld Code (Only contain one-step policy evaluation)

NxN GridWorld by Policy Iteration

BlackJack by Monte Carlo

Taxi v3 problem by SARSA and Q-Learning (Temporal Difference)

CarRental Policy Iteration (unfinished)

Study Note

The study note of Reinforcement Learning: An Introduction. Contents in .md and .ipynb are the same.

N-Armed-Bandit Problem

N-Armed-Bandit.ipynb now has included the entire algorithms of this interesting problem.

4 action selelcting algorithms: epsilon-greedy, softmax, upper bound confidence (UCB) and gradient ascent (preference estimation).
2 data generation methods: stationary and nonstationary.
2 initial value setup methods: add baseline and setup burning period.

Future works on this script will focus on optimizing the performance and correcting potential bugs. Solutions by Gym will be added later.

GridWorld Problem

Size of the GridWorld can be changed at will. To get the same result as Reinforcement Learning: An Introduction, change n=4.
The GridWorld_DP.ipynb only contains the policy evaluation.
The GridWorld_by_PolicyIteration.ipynb contains completer policy iteration procedure. Value iteration is a special case in policy iteration, which can be adapted by the code.

Future works will focus on optimizing the efficiency and adding visualization. Solutions by Gym will be added later.

BlackJack

The BlackJack problem is solved by Monte Carlo Method.
Has finished the policy evaluation part. Currently working on policy improvement.
Solutions by Gym will be added later.

Taxi_v3 Problem

The Taxi_v3 problem is solved by Temporal Difference Method
The code contains on-policy method SARSA and off-policy method Q-Learning
Apply Gym API to make the environment