/TFG---GANs

Primary LanguagePythonCreative Commons Zero v1.0 UniversalCC0-1.0

Generation of Content Through Antagonic Generative Networks

The research paper can be found here.

There have been major advances in artificial intelligence, particularly machine learning, over the past decade. These breakthroughs have been made mostly because of the increase in processing capacity over time, as anticipated by Moore's law in 1965, which projected that the number of transistors in a given surface would double every year, as illustrated in the next figure.

moores-law

The rise in computational power has enabled the execution of many previously suggested approaches that could not be evaluated to their full potential due to a lack of computing capacity. For example, although multilayer perceptrons were proven to be able to approach any mathematical function, by the 1980s they were outclassed by support vector machines due to the inefficiency in the training process. Thanks to advancements in efficiency and computational power, multilayer perceptrons have grown into far more complicated structures like the ones presented in this work.

This increase in computation power has been used in recent years to solve mostly classification and regression problems using a variety of algorithms and models, in which the model must learn either to differentiate a sample between every class available or how the features of a given sample are related to another in order to predict its value. However, the aim of this work is to solve a much more complex problem. The models to be investigated in this work must understand not only the distinctions between the classes to which a sample can belong but also the exact attributes that essentially characterize each conceivable class. For example, while differentiating a cat from a dog is relatively simple, defining the specific characteristics of a dog and a cat is a considerably more difficult process. These models learn those specific properties to generate synthetic samples that incorporate the learning of the model.

There has been a lot of progress in recent years when it comes to creating videos, photos, and even music. Deepfakes are great examples of these advances: images or films of individuals that have been artificially made by transferring the facial expressions and gestures from one video to another, resulting in videos of people saying things they did not actually say. Other studies worth examining are NVIDIA's GauGAN (https://arxiv.org/abs/1903.07291)which can transform a segmentation map into a realistic image, and OpenAI's Dall E (https://doi.org/10.48550/arxiv.2102.12092), which can generate faithful images based on a textual description provided by the user. This next figures illustrate examples of what these two models can do.

This is why image processing specialists are increasingly paying attention to the potential of generative adversarial networks (GANs). Image scaling, image shifting between domains (e.g. transitioning from daylight to night scenes), and many other applications benefit greatly from the usage of GANs in image production. To achieve these results, many modified GAN architectures have been developed with their own distinct properties for solving certain image processing challenge, although the baseline always stays the same.

In a GAN, two agents fight against each other: a Generator and a Discriminator, shown here

ejemplo1

The Generator produces an image that tries to mimic a real one, and then, that image is fed to the Discriminator, so it determines whether the image generated by the Generator is authentic or not. Initially, the Generator will produce low-quality images that the Discriminator will immediately identify as fake. Thanks to the Discriminator's decision, the Generator will learn to trick the Discriminator after collecting enough information, while the Discriminator will learn what a real image looks like by processing several real images. As a consequence, the generative model ends up producing highly realistic outcomes.

Github Structure

The repository is structured the following way:

Models

There are four GANs with each implementation in the folder with its name:

Each with its own README file where its code structure and basic description are explained.

Files

This main folder additionally contains:

  • Constants.py: Which has the constant values that the other files can use.
  • CustomLayers.py: Implements custom layers for both the Generators and Discriminators of each model.
  • Blocks.py: Which has the declaration of most types of blocks that the models Generators and Discriminators are made of.
  • ImageFunctions.py: Which implements all the logic that is used for handling images and datasets.
  • Training.py: This implements all the functions needed for training each model.