This is my python library and notes for Reinforcement Learning. Hope I can understand these algorithms completely.
-
Bellman Equations
Q(s,a) is Action-value Function and V(s) is value Function
-
Advantage Functions
-
Policy Gradient
-
Q-Learning
-
Double Deep Q-Learning
use different nets to choose action and estimate action-value function
-
Dueling Deep Q-Learning
-
A2C \ A3C
-
TD3 (Twin Delayed DDPG)
-
TRPO
Find the relation between two policy
-
PPO
- change KL constraint to Penalty
- add clip to make each step smaller
- make optimization easier