Reinforcement Learning
- Actual Inequalities:
- UCB Intuition:
- Upper Bound Derivation:
- Thompson Sampling Intuition:
- Multi-Armed Bandit Problem to Reinforcement Learning:
- MDP Types:
- Bellman Optimality Equation:
- Policy Improvement:
- Policy Improvement Pseudocode:
- Value Iteration:
- Summary:
- Temporal Difference Learning:
- SARSA Pseudocode:
- Q-Learning Pseudocode:
- Q-Learning vs SARSA:
- RL Steps: