CompVis/taming-transformers

VQGAN training details

Andrew-Brown1 opened this issue · 3 comments

Hi,

Thanks for the great repo! Could I ask some questions about training VQGAN?

What batch size did you train it with, and for how long?
Also I see here that you wait until you add the discriminator loss https://www.youtube.com/watch?v=fy153-yXSQk

Do you wait until the model has converged without it before adding it?

Thanks!

rromb commented

Hi, great question :) Most of our published VQGAN models are trained on a single 40GB VRAM GPU with a batch size of ~12 (bs=14 for the f16 model), depending on the hyperparameters. Regarding your second question, yes, it makes sense to monitor the perceptual loss and then add the discriminator to the training loop.

Hey - thanks!

Hi! Can we get any details about training costs? I.e. how many epochs and gpu days took to train the OpenImages model on a single 40GB VRAM GPU?

Thanks!