/Reinforcement-Learning-Algorithms

These implementatios shows Convergence and performance of policy and value iteration algorithms, how the convergence of these algorithms to the optimal value function depends on the number of iterations used. Furthermore, I have implemented on-policy SARSA and off-policy Q-learning algorithms and showed how the performance of these algorithms depends on the exploration-exploitation tradeoff, and on learning rates. My experiments were evaluted on benchmark reinforcement learning tasks such as a smallworld, gridworld and a cliffworld MDP to analyze the performance of our algorithms.

Primary LanguageMATLAB

Watchers