Denys88/rl_games

Continuing training from checkpoint

cvoelcker opened this issue · 3 comments

Hi, on the computing infrastructure I am using I need to continue interrupted training regularly. I have been trying to use the checkpointing utility (for PPO, but I think these issues appear for all) to reload the checkpoints, but the training doe not actually continue from those checkpoints. I believe that is because other important parameters such as the optimizer are not stored in the checkpoints (please correct me if I am wrong).

In the image below, I interrupted two runs with the same seed at two different states and continued training from the latest checkpoint.
image

Would it be possible to checkpoint all components of the algorithms to enable continuing training from a checkpoint?

I save optimizer state, but probably adaptive LR can break something and need to doublecheck if I save latest LR.
Also it could be related to how I gather statistics:
Once you restarted training first envs with done true have low scores because they finished earlier. I can save statistics state and it may improve results BUT when you restart training, all envs start playing from scratch and it might impact rewards anyway.

Ah, thanks for the clarification. I am going to add an independent "test" run every couple of iterations to see if this is indeed nonly a reporting artifact. However, the performance collapses several times, even though I only interrupted once, so I think something else might also be going on?

You can try it with 0 learning rate. It should return back to the same numbers pretty fast.