Add example on how to resume model training
Opened this issue · 1 comments
Modify existing code samples and cofigs to include the option of resuming training.
For example: given a checkpoint from epoch 100 run the training for 50 more epochs.
Restart where training left off.
Update: Dutch_F3 notebook
Re-use min_epoch param in the config
param for weighs-path to resume -- double check what param we reuse
Need more discussion offline, but right now suggest you use:
TRAIN.BEING_EPOCH - epoch used to resume training from
TEST.MODEL_PATH - path to model you want to resume from (however, we need to discuss this - I don't know if there's a better mechanism to specify which model to resume from).
For example, the following uses the same setup we do https://github.com/leoxiaobin/deep-high-resolution-net.pytorch/blob/master/experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml
Thoughts on how to specify model name to resume from? Suggest we add TRAIN.MODEL_RESUME or something of that nature, which technically renders BEGIN_EPOCH useless