/RL

Primary LanguagePythonMIT LicenseMIT

Reinforcement learning

• Intro to RL • Intro to MDP • Q-Learning

Components in RL:

  1. Agents
  2. Environment
  3. States
  4. Rewards – Win (+1), Loss (-1), Draw (0)

Markov Decision Process:

Markov Reward Process – Applied to anything that is sequential in nature

Value function: (Tells if we should transition from one state to another)

State – value Action – value

Optimal Value function:

Includes both state and action values to decide the best one. Bellman Optimality Equation – Values and states satisfy recursive relations for any MDP

Tasks in RL:

Episodic tasks Continuous tasks

Q-Learning:

Q represent how useful a given action is in gaining a reward. It is an off-policy algorithm.

Exploration Exploitation Tradeoff:

Greedy Action
Non-Greedy Action