[Question] Incremental training

Question

[Question] Incremental training

Closed this issue 7 years ago · 3 comments

I was wondering whether it is currently possible to interrupt a training run and pick off where it was stopped. I noticed from the example scripts that there’s a --resume switch, which apparently allows extending existing models, e.g. to add denoising on top of superscaling. I was wondering whether this also allows “pausing” the training phase and pick off later. I did try using --resume like this, but it started counting the epochs from 0 again, and moreover it seems that the old .t7 model files for previous epochs were being rewritten. Would using --resume 5 times, with 10 epochs each, produce a model equivalent to a single training run over 50 epochs?

Thank you kindly for making waifu2x!

Answer 1 · 2017-09-27T08:46:29.000Z

It is not fully supported. Training starts with the trained model specified by -resume option, but the learning rate is reset. So you need to specify both -resume and -learning_rate. The learning rate is displayed in the cosole output for each epoch.

# 2	
learning rate: 0.00024853029126955

Answer 2 · 2017-09-27T08:54:48.000Z

In addition, if you use train.lua, I recommend dev branch. It is faster than master branch.

Answer 3 · 2017-09-27T09:03:12.000Z

Thanks for the pointer!