고전강화학습에서부터 DQN까지의 강화학습 알고리즘의 이론 및 구현
작성자 : 정원석
언어는 python을 사용하였다.
deep learning 프레임워크로는 Tensorflow 또는 Kears를 사용하였다.
이론: https://wonseokjung.github.io//reinforcementlearning/update/RL-RL1/
1.Introduction 2.강화학습 3.강화학습의 예 4.강화학습의 요소
https://wonseokjung.github.io//reinforcementlearning/update/RL-RL2/
Multi-armed Bandits1.A k-armed Bandit Problem 2.Action-value Method 3.The 10-armed Testbed 4.Incremental Implementation 5.Tracking a Nonstationary Problem 6.Optimistic Initial values 7.Upper-Confidence-Bound Action Selection 8.Gradient Bandit Algorithms
실습 :
- openAI tutorial
https://wonseokjung.github.io//reinforcementlearning/update/openai-gym/
이론:
실습:
이론:
-
Dynamic programming Policy Evaluation
-
Dynamic programming Policy Iteration
-
Dynamic programming Value Iteration
실습:
- policy iteration - grid world
- value iteration - grid world
이론:
-
https://wonseokjung.github.io//reinforcementlearning/update/MonteCarlomethod/
-
https://wonseokjung.github.io//reinforcementlearning/update/MC2/
-
https://wonseokjung.github.io//reinforcementlearning/update/RL-MC3/
-
https://wonseokjung.github.io//reinforcementlearning/update/RL-MC4/
실습:
-
Get familiar with the Blackjack environment (Blackjack-v0)
-
Monte Carlo Prediction to estimate state-action values
-
on-policy first-visit Monte Carlo Control algorithm
-
off-policy every-visit Monte Carlo Control using Weighted Important Sampling algorithm
이론:
- one-step TD
- https://wonseokjung.github.io//reinforcementlearning/update/RL-TD1/
- https://wonseokjung.github.io//reinforcementlearning/update/RL-TD2/
- n-step bootstrapping:
- https://wonseokjung.github.io//reinforcementlearning/update/RL-NTD1/
- https://wonseokjung.github.io//reinforcementlearning/update/RL-NTD2/
- https://wonseokjung.github.io//reinforcementlearning/update/RL-NTD3/
- Eligibility Traces
실습:
-
Get familiar with the Windy Gridworld Playground
-
Implement SARSA
-
Get familiar with the Cliff Environment Playground
-
Implement Q-Learning in Python
이론:
-
On-policy Prediction with Approximation
-
On-policy Control with Approximation
실습:
-
Get familiar with the Mountain Car Playground
-
Q-Learning with Value Function Approximation
이론:
-
DQN
-
DDQN
-
Prioritized Experience Replay
실습:
-
Get familiar with the OpenAI Gym Atari Environment Playground
-
Deep-Q Learning for Atari Games
-
Double-Q Learning
-
Prioritized Experience Replay
-
SuperMario-DQN
-
Using Keras and Deep Q-Network to Play FlappyBird
https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html
이론:
실습:
-
REINFORCE with Baseline
-
Actor-Critic with Baseline
-
Actor-Critic with Baseline for Continuous Action Spaces
-
Deterministic Policy Gradients for Continuous Action Spaces (WIP)
-
Deep Deterministic Policy Gradients (WIP)
-
Asynchronous Advantage Actor-Critic (A3C)
-
Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto Second Edition, in progress MIT Press, Cambridge, MA, 2017
-
Dennnybrtiz https://github.com/dennybritz