/Generative-Model-Survey

My exploration on Generative Model, mainly focus on GAN architecture

Generative Model

This paper list is a bit different from others. I'll put some opinion and summary on it. However, to understand the whole paper, you still have to read it by yourself!
Surely, any pull request or discussion are welcomed!

What's the advantage of GAN?

In the past, the generative model may use MSE as the criterion, which makes it blurring and unlike the natural image. However, since the appearance of GAN architecture, generating blurring image is easy to be discriminated by the discriminator, which encourages the generator to produce more natural image.

You may hear auto-encoder, what's the difference between them?

An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. For generative model, we often aim to interpret the freature space, so you can see that we will start the generative model with a random noise.

Paper

  • InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets [NIPS 2016] [ongoing]
    • Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel
    • See the paragraph 2 in Sec.4 for their motivation (with example)
    • Decompose the input vector into separate vector with salient structured semantic features c
    • However, the generator tends to ignore c, the latent code, so the authors add an restriction (mutual information) on the loss
  • Learning Image Matching by Simply Watching Video [ECCV 2016]
    • Gucan Long, Laurent Kneip, Jose M. Alvarez, Hongdong Li
    • What's image matching? Given two images, we have to identify the object in one image toward the same object in another image
    • Start from the insight that the problem of frame-interpolation implicitly solves for inter-frame correspondences
    • Note that this back-tracking does not mean reconstructing input images from the output one. Instead, we only need to find the pixels in each input image which have the maximum influence to each pixel of the output image. 👉 backpropogation
    • The correspondence is calculated by combining the same coordinate of the two images (I1,I3)
    • Deep neural network for Frame Interpolation: Use highway to maintain the information of location and the same time, easily to make the network deeper
    • Future work: we believe that the present unsupervised learning approach holds brilliant potential for the more natural solutions to similar low-level vision problems, such as optical flow, tracking and motion segmentation.
  • SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient [arXiv]
    • Lantao Yu, Weinan Zhang, Jun Wang, Yong Yu
    • Tackle two main problem:
      • If the generated data is based on discrete tokens, the “slight change” guidance from the discriminative net makes little sense because there is probably no corresponding token for such slight change in the limited dictionary space
      • GAN can only give the score/loss for an entire sequence when it has been generated
    • Use a rollout policy to complete the whole sentence and use the descriminator to give the reward
    • Source code
  • Neural Photo Editing with Introspective Adversarial Networks [arXiv 2016]
    • Andrew Brock, Theodore Lim, J.M. Ritchie, Nick Weston
    • Produce specific semantic changes in the output image by use of a contextual paintbrush 🎨 that indirectly modifies the latent vector
    • Hybridization of the Generative Adversarial Network and the Variational Autoencoder designed for use in the editor, aka IAN
      • Why combine these two? Training VAE is more stable (I guess)
    • Combine the encoder part of auto-encoder with the discriminator 👉 discriminator learns a hierarchy of features that are useful for multiple tasks, including inferring latents(encoder in auto-encoder) and comparing samples(D in GAN)
    • Introspective Adversarial Networks:
      • generator: generate image that fool the desciminator
      • auto-encoder: to reconstruct the image (image -> feature -> image)
      • dixcriminator: indtead of binary labels, the model is assigned to discriminate the orginal image, reconstructed image, generated image.
    • IANs maintain the balance of power between the generator and the discriminator. In particular, we found that if we made the discriminator too expressive it would quickly out-learn the generator and achieve near-perfect accuracy, resulting in a significant slow-down in training. We thus maintain an “improvement ratio” rule of thumb, where every layer we add to the discriminator was accompanied by an addition of three layers in the generator.
  • WaveNet: A Generative Model for Raw Audio [arXiv 2016]
    • Aaron van den oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, Koray Kavukcuoglu
    • Not a GAN architecture
    • DeepMind Post
    • Use generative model to automatically generate the audio
    • Fully dilated causal convolution
      • causal: only depends on the previous sample data
      • dilated: use dilated convolution to increase the receptive field
    • Conditional wavenet
      • Global condition: use a V matrix as projection matrix (See eq.2)
      • Local condition: this time the additional input can be a sequence. Use a transpose convolutional network to project(upsample) it to the same length as input audio signal.
  • Improved Techniques for Training GANs [NIPS 2016]
    • Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen
    • Code for the paper
    • Feature matching: instead of maximizing the output of discriminator, it's trained to match the feature on an imtermediate layer of discriminator
    • Minibatch-discrimination:   - Motivation: because the discriminator processes each example independently, there is no coordination between its gradients, and thus no mechanism to tell the outputs of the generator to become more dissimilar to each other. (Prevent the generator collapse to single point)
      • Allow the discriminator to look at multiple data examples in combination-, and perform what we call minibatch discrimination
      • Calculate the l1-error btn each samples feature and finally concatenate the output with the sample feature
      • Hope the generated images to be diverse 👉 less probability to collapse
    • Historical averaging to stablize the training process
    • Automatic evaluation metrix, which is based on the inception model (See section 4)
  • Semantic Image Inpainting with Perceptual and Contextual Losses [arXiv 2016]
    • Raymond Yeh, Chen Chen, Teck Yian Lim, Mark Hasegawa-Johnson, Minh N. Do
    • Semantic inpainting can be viewed as contrained image generation
  • Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [ICLR 2016]
    • Alec Radford, Luke Metz, Soumith Chintala
    • Explore the extension of models for deeper generative model
      • all-convolutional layers: to learn upsampling itself
      • eleminate the fully connected layer: increase the model stability but hurt convergence speed
      • use batchnorm: get deep generator to begin learning, preventing from collapsing all sample to single point
      • ReLU activation: for generator, it helps to converge faster and cover the color space. for discriminator, use leaky ReLU
    • Fractionally-strided convolution instead of deconvolution. To see how fractionally-strided conv is, here's the link
    • Want the model to generalize instead of memorize
    • Use the discriminator as feature extractor (laerned unsupervised) and apply it to supervised laerning task. This produces comparable results
    • Official source code: Torch version, Theano version
  • Generative Adversarial Networks [NIPS 2014]
    • Scenario: The generative model can be thought of as analogous to a team of counterfeiters, trying to produce fake currency and use it without detection, while the discriminative model is analogous to the police, trying to detect the counterfeit currency.
    • In other words, D and G play the following two-player minimax game with value function
    • Find Nash equilibrium by gradient descent of D and G
    • Nice post from Eric Jang, Generative Adversarial Nets in TensorFlow
    • Another post about GAN: Generating Faces with Torch
    • Official source code: Theano version
  • Deep multi-scale video prediction beyond mean square error [ICLR 2016]
    • Original work only use MSECritetrion to minimize the L2(L1) distance, which induce the blurring output. This work propose the GDL (gradient difference loss), which aims to keep the sharp aprt of the image.
    • Adversial training: create two networks(Discriminative ,Generative model). The goals of D is to discriminate whether the image is fake or not. The goals of G is to generate the image not to discriminated by D. => Adversial
    • D model outputs a scalar, while G model outputs an image
    • Use Multi-scale architecture to solve the limitation of convolution (kernel size is limited, eg. 3*3)
    • Still immature. Used in UCF101 dataset, due to the fixed background

Suggest papers

  • Adversarial examples in the physical world [arXiv 2016]
    • Alexey Kurakin, Ian Goodfellow, Samy Bengio
  • Generative Visual Manipulation on the Natural Image Manifold [ECCV 2016]
    • Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, Alexei A. Efros
    • Demo video
  • Attend, Infer, Repeat: Fast Scene Understanding with Generative Models [NIPS 2016]
    • S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, Geoffrey E. Hinton

Recommended Post