[HANDS-ON BUG] Mean_rewards value is always 1.0 for FrozenLake-v1 without slippery

Question

[HANDS-ON BUG] Mean_rewards value is always 1.0 for FrozenLake-v1 without slippery

philippejuhel opened this issue 10 months ago · 1 comments

Describe the bug

For Unit2 hands-on, in the evaluate_agent function, the env.reset() is in the for episode loop which means that, for the FrozenLake-v1 environment, without slippery, the initial state and the following action-state will always be the same so the total_rewards_ep is always 1.0.

Just before this line :
mean_reward = np.mean(episode_rewards)
I've added this line :
print(episode_rewards)

and it always shows :

[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ..., 1.0]
Mean_reward=1.00 +/- 0.00

So, we need another metric.

Material

Did you use Google Colab? Yes

Answer 1 · 2023-12-22T18:14:24.000Z

Hello,

I've only taken a quick look at the handbook and the code for now, but I wonder if this might be desired behavior, because we do need to reset the agent at every evaluation episode, and if the environment is deterministic (non-slippery), we would always expect the same result/reward with a deterministic policy (greedy is used for evaluation, always selecting the best action).

While testing for multiple episodes in this case might not affect the results (the number of eval episodes is configurable), the same evaluation code should work for slippery version.