simoninithomas/Deep_reinforcement_learning_Course

Q Learning with FrozenLake Step 4: The Q learning algorithm

dawn2034 opened this issue ยท 8 comments

I have a trouble in the code at 41th line "episode += 1" , why does episode need to +1 here?

episode += 1 increases the episode count, from 0, to whatever episode the agent is currently at. This is used to calculate a new epsilon value one line below. Increasing episode count decreases epsilon which increases the likelyhood of action selection via the greedy policy.

It is normal that the agent does not learn how finish the game using the hyper-parameters that he use?

I have the same issue as @rscova

The algorithm with the current settings does not learn to move to the goal. Sometimes I get a Q-table full of zeros after training for 50,000 episodes. Other times I get non-zero values, but the agent moves very inefficiently and never reaches the goal.

@CarterEllsworth Isn't this a loop? for episode in range(total_episodes) means that episode is automatically incremented so there wouldn't be a need to manually increase episode.

@WillKoehrsen Yes, the problem is the loop. @CarterEllsworth

Indeed the problem comes from the loop thanks for figuring out ! ๐Ÿ‘

Consequently, I've just modified the notebook:

  • Modified the decay_rate to 0.005 (thanks to lukewys
  • Remove episode +=1 (thanks to all ๐Ÿ‘ )
  • Modified the "watch our agent play" to only print the last state (to see if we are in the goal or not) and how many step did it took.

I've just credited you in the commit.

About the slippery, indeed is normal, we are in a stochastic environment, I didn't want to transform it as deterministic because it would be to simple for a q-learning problem. But you can make it stochastic if you want.

Thanks @simoninithomas. Does it mean that in Taxi-v2 we have to remove "episode += 1" from the loop too?