About code

Question

About code

N-Kingsley opened this issue 5 years ago · 6 comments

In DRQN.ipynb, if config.NSTEP is equal to 1, then is this step that 'non_final_next_states = torch.cat([batch_state[non_final_mask, 1:, :], non_final_next_states], dim=1)' redundant?

Answer 1 · 2018-12-05T03:12:55.000Z

Yes, it is redundant. In fact, if config.NSTEP > 1 it may break this code

Answer 2 · 2018-12-05T06:02:32.000Z

Yes, I think it's incorrect if if config.NSTEP > 1, too.

And if I change the config.SEQUENCE_LENGTH=10, then is the code still feasible?

Answer 3 · 2018-12-05T16:26:08.000Z

Yes, config.SEQUENCE_LENGTH can be increased to 10. However, it should be noted that this may require tuning other hyperparameters as well to maintain reasonable performance.

Answer 4 · 2018-12-19T11:36:24.000Z

Thanks.
In 'wrap_deepmind.py', done is True when losing a life, and we will reset reward to 0. Then in a multi-life environment, will this lead to a reduction in rewards?

Answer 5 · 2018-12-19T11:42:19.000Z

Should we do ‘wrap_deepmind(env, episode_life=False)’ during the test?

Answer 6 · 2019-02-11T16:32:18.000Z

@N-Kingsley setting episode life=True could have implications on what the agent learns in training; however, in Pong it shouldn't prevent the agent from learning the optimal policy. During evaluation, it shouldn't matter whether episode life is true or false.