Implementation of Reinforcement Learning Algorithm

This is my python library and notes for Reinforcement Learning. Hope I can understand these algorithms completely.

Key Concepts

  • Bellman Equations

    CodeCogsEqn

    Q(s,a) is Action-value Function and V(s) is value Function

  • Advantage Functions

    advantage

  • Policy Gradient

    policy_gradient

Kinds of Algorithm

rl_algorithms_9_15

  • Q-Learning

    Q-learning

  • Double Deep Q-Learning

    ddqn

    use different nets to choose action and estimate action-value function

  • Dueling Deep Q-Learning

    duelingdqn

  • A2C \ A3C

    A2C

  • TD3 (Twin Delayed DDPG)

    td3_1

    td3_2

    td3_3

  • TRPO

    TRPO

    TRPO_1

    Find the relation between two policy

    TRPO_2

    • Trick 1

      TRPO_3

    • Tricks....

      prove a inequality and make the lower bound higher every time

    • Process

      find conjugate gradient and do a line search on the direction

  • PPO

    PPO_1

    • change KL constraint to Penalty
    • add clip to make each step smaller
    • make optimization easier