LarsKue/lightning-trainable

No last.ckpt

Closed this issue · 1 comments

Hi Lars

Another question regarding checkpoint setting. When I set with save_top_k = 1, every_n_epochs=1 and save_last=True, the training process works perfectly. But if I change to save_top_k = 1, every_n_epochs =5 and save_last=True for example, thus after 2 epochs, the training process cannot find last.ckpt. I can see that last.ckpt was created after the first epoch, then stays till first validation, then the second epoch can finish (here the last.ckpt was deleted somewhere), but the second validation reports no last.ckpt.

I looked for the reason whether torch lightning can delete last.ckpt somewhere, but no answer. Any place lightning_trainable delete it before reaching ever_n_epochs?

Thanks

JB

Hi JB, thank you for the question. Please check out Lightning's ModelCheckpoint callback for info on checkpointing.

lightning-trainable does not interact with checkpoints beyond configuring this for you. As per the docs, the last.ckpt is a symbolic link on a local filesystem, so it may point to nothing before epoch 5 in your case.