poolio/unrolled_gan

Couldn't reproduce mode collapse without unrolling operation

superliuwanjia opened this issue · 2 comments

I'm unable to reproduce mode collapse without unrolling operation, an experiment result mentioned in Appendix A, B in the paper. What are the network and training configurations required to reproduce the mode collapse problem?

Thanks,

Robin

Hello,
First thank you for taking the time to try to reproduce our results!

Sadly, this is not all that straight forward of a question to answer. To my knowledge, all the required information is put in the appendix and I feel rather not helpful just telling you this, so here are some other thoughts. GAN are still incredibly finicky. They are sensitive to a huge number of factors (many of which are not fully understood). This was a test for us to see if this method had any merit and given its simplicity I am not surprised you have found a working configuration. In more complex setups, like the RNN tests, we had an impossible time getting separation to occur though but this setting is harder to analyze.

In our experience it all comes down to initial loss surface setup by the discriminator. If the D loss surface places some bowl shaped loss that is centered in the output of the generator mass then there will be separation in G. One way to do this is to modify how initializations are done and the initial state of the discriminator. In this setting we wanted to place the generator in a suboptimal position right off the bat -- as such we used a small initialization in the hidden layers (0.8 scaled orthogonals) and test to see if unrolling could recover from this. In my experience this stability vs collapse is very binary. In our setting we used this task for its ease of visualisation and inspection. If we instead was careful with how the initial D moved and placed the initial G to be very spread out over the entire surface things would behave very differently.

Another thing to look at to get gan's into bad positions is relative learning rates of G and D. Consider a setup where D initial learns some approximate loss surface in a couple steps. The G optimization step is to now maximize its loss on this surface as much as possible. The optimal thing for G to do is to move all of its mass up the gradient setup by D. Depending on how far G moves / can exploit the current D it could get it stuck in a local optimum, oscillate, or never recover. When introducing unrolling however much of this is relaxed. The only requirement is that given the amount of unrolling steps D must be able to center itself on where the current G has mode collapsed to thus causing separation.

If you are interested in this problem or a related problem as test bed we would look to a modification that Ben made in this ipython notebook: making one of the gaussian's have a higher probability density than the others. This makes it significantly harder and also introduces a more stable failure mode -- collapsing entirely to the dominate mode vs the rotations.

Hope this helps!

Hi Luke,
I also found the similar situation when implementing Unrolled GAN in PyTorch. At the same time, I tried to use a non-uniform mixture of Gaussian as data distribution and found that unrolled GAN does work to solve mode collapse in most of the cases.
As you said, GAN theory is still being developed. I'm glad to see that something do work.
see my implementation if you're interested: https://github.com/andrewliao11/unrolled-gans