No last.ckpt
Closed this issue · 1 comments
Hi Lars
Another question regarding checkpoint setting. When I set with save_top_k = 1, every_n_epochs=1 and save_last=True, the training process works perfectly. But if I change to save_top_k = 1, every_n_epochs =5 and save_last=True for example, thus after 2 epochs, the training process cannot find last.ckpt. I can see that last.ckpt was created after the first epoch, then stays till first validation, then the second epoch can finish (here the last.ckpt was deleted somewhere), but the second validation reports no last.ckpt.
I looked for the reason whether torch lightning can delete last.ckpt somewhere, but no answer. Any place lightning_trainable delete it before reaching ever_n_epochs?
Thanks
JB
Hi JB, thank you for the question. Please check out Lightning's ModelCheckpoint
callback for info on checkpointing.
lightning-trainable
does not interact with checkpoints beyond configuring this for you. As per the docs, the last.ckpt
is a symbolic link on a local filesystem, so it may point to nothing before epoch 5 in your case.