Brief Introduction

In our project, we are tasked with learning an agent to traverse a frozen lake without falling into the water. The agent learns by trial-and-error, adjusting the actions it takes based on the rewards it received in the past.
We will use the Q-learning algorithm. This algorithm generates a table called the Q-table which has a mapping of every state and possible action to a value. The agent will learn which actions to take based on the values of this table.

How does the behavior of the agent differ when using a high or low value for the exploration-exploitation (ε) parameter
Does the discount factor (γ) have a noticeable impact on the score achieved by the agent
Does the learning rate (α) have a noticeable impact on the score achieved by the agent

Full report

4rn3/KTA_q_learning

Brief Introduction