Status: main topics covered, requires revision / editing
- Ch. 1: Introduction
- Ch. 2: Meta-heuristics
- NEAT, WANN
- CEM, OpenAI-ES, CMA-ES
- Ch. 3: Classic theory
- Bellman equations
- RPI, policy improv. theorem
- Value Iteration, Policy Iteration
- Temporal Difference, Q-learning, SARSA
- Eligibility Traces, TD-lambda
- Ch. 4: Value-based
- DQN
- Double DQN, Dueling DQN, PER, Noisy DQN, Multi-step DQN
- c51, QR-DQN, IQN, Rainbow DQN
- Ch. 5: Policy Gradient
- REINFORCE, A2C, GAE
- TRPO, PPO
- Ch. 6: Continuous Control
- DDPG, TD3
- SAC
- Ch. 7: Model-based
- Bandits
- MCTS, AlphaZero, MuZero
- LQR
- Ch. 8: Next Stage
- Imitation Learning / Inverse Reinforcement Learning
- Intrinsic Motivation
- Multi-Task and Hindsight
- Hierarchical RL
- Partial observability
- Multi-Agent RL