[Question] Resume training from a checkpoint
sevashasla opened this issue · 2 comments
Hello! Thank you for the great work.
I had been training a model for approximately 20 hours when an error occurred with my computer, causing the training to stop. Is there a way to resume training from the checkpoint? I saw the line TODO: Add midpoint loading and the commented code after it. I could try to implement it by myself, and could you please share the potential problems?
Hi,
I will not have time to look into it this week.
In addition to loading checkpoints (see here), we need to handle the training dataloader train_dataset
properly so that it provides the currently training images. I would use local_tensorfs.blending_weights[:, -1] > 0
to determine which frames should be activated / deactivated in the training dataset.
Thank you for your fast answer!