about train

Question

about train

Closed this issue 3 years ago · 4 comments

I can't find the trained model when I run train.py

`Number of training data: 1485
Model restore success!
Start training for phase Decom, with start epoch 100 start iter 9200 :
Finished training for phase Decom.
Model restore success!
Start training for phase Relight, with start epoch 100 start iter 9200 :
Finished training for phase Relight.

Process finished with exit code 0`

Answer 1 · 2021-04-30T07:09:16.000Z

I also encountered this problem. How did you solve it?

Answer 2 · 2021-04-30T07:25:53.000Z

sorry，not yet------------------ 原始邮件 ------------------ ***@***.***> 发送时间: 2021年4月30日(星期五) 下午3:09 ***@***.***>; ***@***.******@***.***>; 主题: Re: [aasharma90/RetinexNet_PyTorch] about train (#2)

Answer 3 · 2021-04-30T07:26:56.000Z

Hi,

Can you please change the checkpoint directory name and try again? For e.g., run train.py with --ckpts_dir ./ckpts_train/?

The code (irrespective of training or testing) uses the directory 'ckpts' by default. If you are training, the code sees that you already have some pretrained checkpoints in that directory that correspond to the final iteration (epoch=100, iter=9200). Hence, it does not proceed with any further training.

Answer 4 · 2021-05-06T07:08:06.000Z

Hi @Zruto @LvShuaiChao

Did the above suggestion resolve the issue? If yes, could you please accept the solution and close the issue? If no, could you please share your entire training log file if possible?

Thanks,
Aashish