abhiskk/fast-neural-style

Time for training a new style

nitish116 opened this issue · 3 comments

Hi,
I am using p2.xlarge in AWS (NVIDIA K80 GPU, 61GB RAM , 128GB Space). To check it out I was using COCO's val2017(5K images). It downloaded the vgg-model file and it has been blank for a very long while.
0.How much time does it take to get trained for 5000 images?
1.Also when I am trying to kill the python process with "kill -9 PID", it is not getting killed, is it because of the multi threading ? How do I end the training process in between ?

--> I am able to run 'eval' option using cuda 1 and it takes about 1-2 seconds to generate output images. So I am assuming all the package installations are fine on my side.

Please help me resolve the issues.

It should be able train quite quickly on 5000 images, if you want to see if it starts training or not modify the --log-interval [code] parameter to 1 and see if output messages are printed. Also run the training command using the unbuffered parameter position: python -u train.py..., you can test training with a batch size of 1 on your local laptop/desktop to check if everything is working correctly before running on AWS.

@abhiskk How many images should we use when training a new model?
Is it worth it to go over 10K, 20K, 80K images?

@glesperance you can get good results with around 10K images. I would also suggest to monitor the loss to get good quality results.