huggingface/deep-rl-class

[HANDS-ON BUG] Mean_rewards value is always 1.0 for FrozenLake-v1 without slippery

philippejuhel opened this issue · 1 comments

Describe the bug

For Unit2 hands-on, in the evaluate_agent function, the env.reset() is in the for episode loop which means that, for the FrozenLake-v1 environment, without slippery, the initial state and the following action-state will always be the same so the total_rewards_ep is always 1.0.

Just before this line :
mean_reward = np.mean(episode_rewards)
I've added this line :
print(episode_rewards)

and it always shows :

[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ..., 1.0]
Mean_reward=1.00 +/- 0.00

So, we need another metric.

Material

  • Did you use Google Colab? Yes

Hello,

I've only taken a quick look at the handbook and the code for now, but I wonder if this might be desired behavior, because we do need to reset the agent at every evaluation episode, and if the environment is deterministic (non-slippery), we would always expect the same result/reward with a deterministic policy (greedy is used for evaluation, always selecting the best action).

While testing for multiple episodes in this case might not affect the results (the number of eval episodes is configurable), the same evaluation code should work for slippery version.