Experiments using sequentially stacked convolutional autoencoders. Not based off of any particular paper.
- The CelebA dataset
- A python notebook environment
- Python 3.7+
- TensorFlow 2.0 or greater
- Pandas
- Numpy
- OpenCV3
- Matplotlib
- Data cropped and resized to standard 128x128
- 64x64 and 32x32 variants created with linear interpolation
- Variants rescaled back up to 128x128 with linear interpolation
- Images normalized between (-1, 1) before training, and un-normalized to (0, 255) (int) after inference.
The model consists of two identical convolutional autoencoders, simply stacked back to back. The only difference is that the first autoencoder (AE1) takes the rescaled 32x32 images as input and computes MSE loss against the rescaled 64x64 images, where the second autoencoder (AE2) takes the rescaled 64x64 images as input and computes MSE loss against the 128x128 images.
- Adam optimizer, LR= 5e-4, beta1 and beta2 default
- 50 epochs
- 4000 training image pairs, 1000 testing image pairs (32, 64, 128 dim each)
- MSE loss
Here is a simple diagram explaining the architecture and where losses are calculated:
- Use a better-suited loss such as Perceptual Loss
- Scale to 256x256
- Include more training/testing data