A basic implementation of Convolutional Autoencoder for image compression on MNIST dataset using Keras framework.
JPEG compression is currently the industry standard for image compression, however, there are many ways that Auto-encoders are being expanded in research that could push auto-encoder data compression over JPEG. This mini-project tries to achieve good results on mnist dataset.
Python 3.7.3
Keras with tensorflow as Backend
The autoencoder model consist of
Three convolutional layers followed by max pooling, reducing the kernel size by half after evry convolution. Three such units were used
x = Conv2D(256, (3, 3), activation='relu', padding='same')(input_img)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(8, (1, 1), activation='relu', padding='same')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(2, (1, 1), activation='relu', padding='same')(x)
x = Conv2D(1, (1, 1), activation='relu', padding='same')(x)
Similar to Encoder in opposite direction only instead of max pooling upscaling was used.(Transpose Convolution was not used due to Hardware Bottleneck and minimal difference)
x = Conv2D(2, (1, 1), activation='relu', padding='same')(x)
x = Conv2D(4, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(256, (3, 3), activation='relu', padding='same')(x)
x = Conv2D(1, (3, 3), activation='tanh', padding='same')(x)
The original MNIST image size is 28 x 28 (Grayscale) but using an encoder the feature space was reduced to 7 x 7.
Thus 28 x 28 = 784 was reduced to mere 7 x 7 = 49 pixels.
The Left Column is of original images and right is of autoencoder based images
This project is licensed under the MIT License - see the LICENSE.md file for details