Stanford-CS-234-RL-2022

Solutions to the Stanford CS:234 Reinforcement Learning 2022 course assignments.

Assignment 1

Frozen Lake Markov Decision Process using Value Iteration and Policy Iterasion

Policy Iteration	Value Iteration

Tabular Q Learning and Deep Q Learning

Learning Curve on the test environment:

Policy Gradient Methods and REINFORCE

Learning Curve of the REINFORCE algorithm on CartPole-v0:

Aplication of Bandit Algorithms in the medical setting

Comparison of different Bandit Algorithms:

Aplication of Upper Confidence Bandit in personalized Recomendation Systems

Comparison of different arm update strategies: