facebookresearch/ic_gan

Suggestions on training 512x512 resolution images

Qianli-ion opened this issue · 5 comments

In the main paper, experiments are conducted on 128x128 and 256x256 resolutions. Has higher resolution experiment being attempted (e.g. 512x512)? Is there any hyperparameters found to be sensitive to the resolution of output? Any suggestions/comments are appreciated.

We have not yet experimented with higher resolutions, and this remains to be explored.
I would start with the hyperparameters chosen for 256x256 and see what comes out. One can further tune the hyperparameters using the same strategy we employed in the Supplementary Material B.2 in our paper, although other exploring strategies might also be useful.

If you manage to train a model at 512x512 in the future and you want to contribute to the repository, you are more than welcome to open a pull request with the successful configuration file for 512x512.

Hi Arantxa, thanks for the comment! I'm currently trying to train with 512 resolution on our custom dataset (will try a 256 version soon). And I found that the training is fairly unstable as the FID metric fluctuate a lot (in range of 70 - 140), and the lowest I can get with IC-GAN StyleGAN2 backbone is 51. In you experience training IC-GAN with StyleGAN2 backbone, is highly volatile training process common? With just regular StyleGAN2, I can reach FID=8.

Hi,

in my experiments the FID wasn't fluctuating much for IC-GAN. It was pretty stable, same as our StyleGAN2 baseline experiments.
Given that you are using a custom dataset, I would suggest to double check that the instance features are correctly normalized, as we do in this line.

Another alternative would be to run our StyleGAN2 baseline models with these configuration files, and see if you still experience this instability. That would rule out other possibilities.

Thanks for the comments Arantxa! Your suggestion on checking instance features is indeed very useful and I found some error there, the instance feature for some reason are all zero which I'll have to do a deep dive on why this happens (I do supply our own embedding during dataset/dataloader definition). I'll update here once I found out more.

Also I'll keep this thread open if you don't mind so that everyone interested in adapt IC-GAN to higher resolution can have a start.

Just a quick update here. I could achieve FID around 9 with IC-GAN on my custom dataset with 512 resolution with default setup. I haven't optimize the hyperparameters yet, so might be further improved. The 512 resolution training is not particularly stable. When FID 30-50, 10-20, i saw quite a bit up and down. The speed of FID decreasing is not as fast as the regular StyleGAN2 in my experiment.