[HANDS-ON BUG] Mean_rewards value is always 1.0 for FrozenLake-v1 without slippery
philippejuhel opened this issue · 1 comments
Describe the bug
For Unit2 hands-on, in the evaluate_agent function, the env.reset()
is in the for episode loop which means that, for the FrozenLake-v1 environment, without slippery, the initial state and the following action-state will always be the same so the total_rewards_ep
is always 1.0.
Just before this line :
mean_reward = np.mean(episode_rewards)
I've added this line :
print(episode_rewards)
and it always shows :
[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ..., 1.0]
Mean_reward=1.00 +/- 0.00
So, we need another metric.
Material
- Did you use Google Colab? Yes
Hello,
I've only taken a quick look at the handbook and the code for now, but I wonder if this might be desired behavior, because we do need to reset the agent at every evaluation episode, and if the environment is deterministic (non-slippery), we would always expect the same result/reward with a deterministic policy (greedy is used for evaluation, always selecting the best action).
While testing for multiple episodes in this case might not affect the results (the number of eval episodes is configurable), the same evaluation code should work for slippery version.