[Question]extending no of epochs of training
Rajesh-ParaxialTech opened this issue · 3 comments
Hello
Suppose i have started training an nnDetection model fixing the no of epochs to 100. Later if i want to resume training beyond 100 epochs, can i update the code and resume the training with the option mode=resume ? without loosing the weights of the model learnt during the first 100 epochs.
Thanking you
Rajesh
Hey @Rajesh-ParaxialTech ,
yes, it is possible to resume the training by exchanging the mode and specifying the new number of epochs. There are a few caveats though:
(1) the learning rate schedule depends on the total number of epochs, thus overwriting the number of epochs will change the learning rate schedule (i.e. it might have been quite low towards the end of training one but will start with a higher learning rate in training two until it decreases towards the end again)
(2) the training ends with SWA, which will periodically increase & decrease the learning rate before averaging the model weights. If you restart after SWA, the model will also be different than a single long training.
Best,
Michael
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.