/SARSA-and-Q-Learning-Algorithms-in-a-Deterministic-Grid-World-Environment

This report compares SARSA and Q-learning in a 9x10 grid world. SARSA, "on-policy," converges quickly with ε=0.2, while Q-learning, "off-policy," performs well with ε=0.4. SARSA excels in reward and speed with lower ε, while Q-learning balances exploration and exploitation with higher ε.

Primary LanguageJupyter Notebook

Stargazers

No one’s star this repository yet.