facebookresearch/localrf

[Question] Resume training from a checkpoint

sevashasla opened this issue · 2 comments

Hello! Thank you for the great work.
I had been training a model for approximately 20 hours when an error occurred with my computer, causing the training to stop. Is there a way to resume training from the checkpoint? I saw the line TODO: Add midpoint loading and the commented code after it. I could try to implement it by myself, and could you please share the potential problems?

Hi,
I will not have time to look into it this week.
In addition to loading checkpoints (see here), we need to handle the training dataloader train_dataset properly so that it provides the currently training images. I would use local_tensorfs.blending_weights[:, -1] > 0 to determine which frames should be activated / deactivated in the training dataset.

Thank you for your fast answer!