Thesis

Abstract

After having observed an information signal, humans are capable of inferring a re- lated but unknown signal with remarkable accuracy. While machines are capable of easily inferring information when the relationship between the observed and unknown signals is known, many signals are not easily modeled and follow more complex distri- butions. We investigate conditional mean estimation within the domain of computer vision and image processing. Knowing the recent success of convolutional neural networks (CNNs) at solving visual tasks, we utilize various CNN architectures to perform image estimation on the MNIST digit dataset. We design an autoencoder model and show that it outperforms the affine estimator, a suboptimal predictor for not jointly Gaussian data like MNIST, at estimating image occlusions when various sized squares and numbers of columns are removed from an image. Notably, both the affine estimator and autoencoder generate blurry and not well-defined output. We then utilize a generative adversarial network (GAN) and show that the visual output is, qualitatively, much more stylistically similar to the desired output, with crisp and well-defined boundaries to each digit. However, quantitatively the GAN performs worse under the same criterion of mean squared error per hidden pixel, even though the visual output was comparable, if not better, in many situations. Finally, we explore using the GAN on the same task with a color dataset of faces, CelebA.

dxyang/Thesis

Thesis

Abstract