• Intro to RL • Intro to MDP • Q-Learning
- Agents
- Environment
- States
- Rewards – Win (+1), Loss (-1), Draw (0)
Markov Reward Process – Applied to anything that is sequential in nature
State – value Action – value
Includes both state and action values to decide the best one. Bellman Optimality Equation – Values and states satisfy recursive relations for any MDP
Episodic tasks Continuous tasks
Q represent how useful a given action is in gaining a reward. It is an off-policy algorithm.
Greedy Action
Non-Greedy Action