/Reinforcement-Learning

Notes and projects are restored in this repo

Primary LanguageJupyter Notebook

Reinforcement Learning Resources

Lectures, Books, Surveys and Thesis of Reinforcement Learning

An Outsider’s Tour of Reinforcement Learning

Reinforcement Learning

强化学习从入门到放弃

OpenAI DeepRL Courses

Dynamic Programming Problems

Study Notes and Codes

Now working on Chapter7 Eligibility Traces of Reinforcement Learning: An Introduction

Study Notes

  • Images and colored texts cannot show correctly on Github. Please copy the above link into nbviewer to get a correct view.

N-Armed-Bandit Code

NxN GridWorld Code (Only contain one-step policy evaluation)

NxN GridWorld by Policy Iteration

BlackJack by Monte Carlo

Taxi v3 problem by SARSA and Q-Learning (Temporal Difference)

CarRental Policy Iteration (unfinished)

Study Note

  • The study note of Reinforcement Learning: An Introduction. Contents in .md and .ipynb are the same.

N-Armed-Bandit Problem

N-Armed-Bandit.ipynb now has included the entire algorithms of this interesting problem.

  • 4 action selelcting algorithms: epsilon-greedy, softmax, upper bound confidence (UCB) and gradient ascent (preference estimation).
  • 2 data generation methods: stationary and nonstationary.
  • 2 initial value setup methods: add baseline and setup burning period.

Future works on this script will focus on optimizing the performance and correcting potential bugs. Solutions by Gym will be added later.

GridWorld Problem

  • Size of the GridWorld can be changed at will. To get the same result as Reinforcement Learning: An Introduction, change n=4.
  • The GridWorld_DP.ipynb only contains the policy evaluation.
  • The GridWorld_by_PolicyIteration.ipynb contains completer policy iteration procedure. Value iteration is a special case in policy iteration, which can be adapted by the code.

Future works will focus on optimizing the efficiency and adding visualization. Solutions by Gym will be added later.

BlackJack

  • The BlackJack problem is solved by Monte Carlo Method.
  • Has finished the policy evaluation part. Currently working on policy improvement.
  • Solutions by Gym will be added later.

Taxi_v3 Problem

  • The Taxi_v3 problem is solved by Temporal Difference Method
  • The code contains on-policy method SARSA and off-policy method Q-Learning
  • Apply Gym API to make the environment