qfettes/DeepRL-Tutorials

About code

N-Kingsley opened this issue · 6 comments

In DRQN.ipynb, if config.NSTEP is equal to 1, then is this step that 'non_final_next_states = torch.cat([batch_state[non_final_mask, 1:, :], non_final_next_states], dim=1)' redundant?

Yes, it is redundant. In fact, if config.NSTEP > 1 it may break this code

Yes, I think it's incorrect if if config.NSTEP > 1, too.

And if I change the config.SEQUENCE_LENGTH=10, then is the code still feasible?

Yes, config.SEQUENCE_LENGTH can be increased to 10. However, it should be noted that this may require tuning other hyperparameters as well to maintain reasonable performance.

Thanks.
In 'wrap_deepmind.py', done is True when losing a life, and we will reset reward to 0. Then in a multi-life environment, will this lead to a reduction in rewards?

Should we do ‘wrap_deepmind(env, episode_life=False)’ during the test?

@N-Kingsley setting episode life=True could have implications on what the agent learns in training; however, in Pong it shouldn't prevent the agent from learning the optimal policy. During evaluation, it shouldn't matter whether episode life is true or false.