RL: A Python repository from abhay-lal

Reinforcement learning

• Intro to RL • Intro to MDP • Q-Learning

Markov Reward Process – Applied to anything that is sequential in nature

State – value Action – value

Includes both state and action values to decide the best one. Bellman Optimality Equation – Values and states satisfy recursive relations for any MDP

Episodic tasks Continuous tasks

Q represent how useful a given action is in gaining a reward. It is an off-policy algorithm.

Greedy Action
Non-Greedy Action