aasharma90/RetinexNet_PyTorch

about train

Closed this issue · 4 comments

Zruto commented

I can't find the trained model when I run train.py

`Number of training data: 1485
Model restore success!
Start training for phase Decom, with start epoch 100 start iter 9200 :
Finished training for phase Decom.
Model restore success!
Start training for phase Relight, with start epoch 100 start iter 9200 :
Finished training for phase Relight.

Process finished with exit code 0`

I also encountered this problem. How did you solve it?

Zruto commented

Hi,

Can you please change the checkpoint directory name and try again? For e.g., run train.py with --ckpts_dir ./ckpts_train/?

The code (irrespective of training or testing) uses the directory 'ckpts' by default. If you are training, the code sees that you already have some pretrained checkpoints in that directory that correspond to the final iteration (epoch=100, iter=9200). Hence, it does not proceed with any further training.

Hi @Zruto @LvShuaiChao

Did the above suggestion resolve the issue? If yes, could you please accept the solution and close the issue? If no, could you please share your entire training log file if possible?

Thanks,
Aashish