training model collapse

Question

training model collapse

wtliao opened this issue 4 years ago · 3 comments

Hi, thanks for sharing your nice work and the dataset. I am playing around your code to know more about your idea. However, after training an epoch, the synthesized images of different classes are blank.

after about 1000 iterations

after 1 epoch

after 5 epoches

Training loss over time

I have not changed anything of your code. Do you have any idea about that? Thanks a lot.

Answer 1 · 2020-12-30T15:48:45.000Z

Hi @wtliao,

This is known and happens from time to time because of the relatively high initial learning rate needed to train the Equalized learning rate StyleGAN convolution blocks. Sometimes this issue resolves itself at later stages of training (It should be solved by epoch 150). Alternatively, you can try lowering the learning rate. It will stabilize the training, but the aging effect might not be the same as in the paper.

Answer 2 · 2020-12-30T20:13:26.000Z

Hi @royorel thanks a lot for you swift reply :). I will let it run till 200 epoches and have a look. I have one more question about the batch_size setting. I notice that the default bs=6 on 4 GPUs. I am a little confused that how 6 samples are assigned to 4 GPUs at each iteration?

Answer 3 · 2020-12-30T20:55:41.000Z

@wtliao each sample is actually a pair of images. So overall it's 12 images over 4 GPUs (3 per GPU)