RL-Theory-book (rus)

Status: main topics covered, requires revision / editing

  • Ch. 1: Introduction
  • Ch. 2: Meta-heuristics
    • NEAT, WANN
    • CEM, OpenAI-ES, CMA-ES
  • Ch. 3: Classic theory
    • Bellman equations
    • RPI, policy improv. theorem
    • Value Iteration, Policy Iteration
    • Temporal Difference, Q-learning, SARSA
    • Eligibility Traces, TD-lambda
  • Ch. 4: Value-based
    • DQN
    • Double DQN, Dueling DQN, PER, Noisy DQN, Multi-step DQN
    • c51, QR-DQN, IQN, Rainbow DQN
  • Ch. 5: Policy Gradient
    • REINFORCE, A2C, GAE
    • TRPO, PPO
  • Ch. 6: Continuous Control
    • DDPG, TD3
    • SAC
  • Ch. 7: Model-based
    • Bandits
    • MCTS, AlphaZero, MuZero
    • LQR
  • Ch. 8: Next Stage
    • Imitation Learning / Inverse Reinforcement Learning
    • Intrinsic Motivation
    • Multi-Task and Hindsight
    • Hierarchical RL
    • Partial observability
    • Multi-Agent RL