Number of Epochs
mehussein opened this issue · 4 comments
Hi,
I am trying to run the cifar10 example in the README file. The command line arguments there specify 15000 as the number of epochs. How important is it to train the model for that many epochs? In other words, what is the minimum number of epochs to train for and still get reasonable results? Based on the speed I am seeing so far, it would take my system (with a single GPU) at least 7 weeks to finish 15000 epochs.
Thanks!
I trained 15000 epochs to guarantee full convergence. I remembered that 5000 epochs can yield competitive performance.
Thanks! And, how many GPUs do you recommend?
Also, what do you mean by convergence here? Qualitatively, I can see that the reconstruction quality is good even at the very beginning of training. However, the sample realism is not good even after hundreds of iterations. They start as smooth images with no structure, then they start to have cifar10-like structures, but when you zoom in, they do not look like real objects. Is that expected to improve after thousands of iterations?
Final questions, would it be possible to release the training configurations (epochs, batch size, etc) for the other datasets, please?
Thanks!
Hi,
sorry for the late response. When I trained the CIFAR-10 flow, I remembered that I used 2 GPUs (not strong ones, probably TITAN). I think 4 GPUs is enough. For convergence, I mean the BPD score on the validation set stop to decrease. It usually requires at least 5000 epochs to get a reasonable BPD score. For the generated images, the quality will be better. But I need to say that Flow-based models are not as good as GAN models on generating realistic images.
For configurations on other datasets, I will try to find them and share with you. But since it is a work two years ago when I was at CMU, it might take some time to find them.