Can this code reproduce the performance of the baseline?
xfdywy opened this issue · 3 comments
Can this code reproduce the performance of the baseline without any change on the hyperparameter? I try to run the code with the dqn mode on other Atari games such as "Qbert", "WizardOfWor", but I can not get the result reported by other papers.
Hey,
I didn't keep consistent in terms of the hyper parameters with the original papers, also most deepmind rl papers are using rmsProp instead of adam as used in this repo. Also as for the dqn, I think in the original code published by deep mind, in their experience replay implementation, when they sample a random batch of history length 4, they don't have a mechanism of preventing the sample batch contains two different episodes, whereas in the experience replay here, each consecutive 4 frames will only come from the same episode. Other than those, I would say it is mostly consistent :)
Thanks for your reply.
I can't find the clipped rewards in your code. Also, the end of life = end of episode strategy and fire reset strategy seems not be implemented.
refer to https://github.com/ShangtongZhang/DeepRL/blob/master/component/atari_wrapper.py