facebookresearch/multipathnet

How to resume from a checkpoint?

Closed this issue · 4 comments

Hello, I want to resume training from a checkpoint, tried to set opt.checkpoint=true, then I got error:

/root/torch/install/bin/luajit: train.lua:227: attempt to index global 'checkpoint' (a nil value)
stack traceback:
train.lua:227: in function 'hooks'
./engines/fboptimengine.lua:50: in function 'train'
train.lua:363: in main chunk
[C]: in function 'dofile'
/root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x004064f0

not sure checkpointing works correctly right now, please use retrain option while we fix checkpointing.

@szagoruyko , Thank you
I notice in logs dir, there is :
transformer.t7,optimState_500.t7,model_500.t7

So I set :
retrain=model_500.t7
transformer=transformer.t7

But where to set optimState_500.t7?

@northeastsquare there is no option, momentum will be reset.