A project utilizing deep learning methods to classify the images in the CIFAR-100 dataset.
Given a blurry image, the task is to classify it into one of the 100 classes in CIFAR-100.
The CIFAR-100 dataset consists of 60000 32x32 colour images in 100 classes, with 600 images per class. There are 50000 training images and 10000 test images.
Link: CIFAR_100_Dataset
- 50k Image as a training set
- 10k as a test set
- Used Unsharp masking kernel to sharpen the blurry images
- Normalized pixel values to be between -1 and 1 (done during training)
- Used data augmentation on training set (flip horizontally, crop from edges) (done randomly per training batch)
We tried different models and techniques like normal CNNs, Depth-wise separable convolution, and resnets. Resnets provided the best results
Used the following architecture:
Residual block 1: 128 filters with size of 3x3.
Dropblock: filter size 5x5.
Maxpool layer with size 3x3.
Residual block 2: 256 filters with size of 3x3.
Dropblock: filter size 5x5.
Maxpool layer with size 3x3.
Residual block 3: 512 filters with size of 3x3.
Dropblock: filter size 5x5.
Maxpool layer with size 3x3.
Residual block 4: 1024 filters with size of 3x3.
Dropblock: filter size 5x5.
Maxpool layer with size 3x3.
Fully connected layer 1: 4096 hidden neurons
Dropout with Keep prob : 0.5.
Fully connected layer 2: 4096 hidden neurons
Dropout with Keep prob : 0.5.
Fully connected layer 3: 4096 hidden neurons
Dropout with Keep prob : 0.5.
With each residual block consisting of:
Batch norm.
Convolution.
Batch norm.
Convolution.
Additional details:
Used Adam Optimizer
Used learning rate decay
Used mini batch of size 250
Used xavier weight initialization
Used relu activation in hidden states
Used softmax in output layer
Used cross entropy as a loss function
Used early stopping
- Overall training accuracy: 99.5%
- Overall training loss: 0.0300
- Test accuracy: 70.7%
- Test loss: 1.2815
Some predictions from test set:
Here we will discuss how to run the project and what each file is responsible of:
This script will download the CIFAR-100 dataset.
This script will extract the CIFAR-100 dataset.
This script will load the data, normalize it, shuffle it, take 2k images from test set as a dev set, and save it in a pickle file.
This script will begin training the vanilla CNN model on the training data, output the results, save the accuracy and loss graphs in output_images folder, save the graph info for tensorboard in graph_info folder, and save the model itself in saved_model. You can expect around 65% accuracy on test set.
This script will begin training the faster depth-wise CNN model on the training data, output the results, save the accuracy and loss graphs in output_images folder, save the graph info for tensorboard in graph_info folder, and save the model itself in saved_model. You can expect around 60% accuracy on test set.
This script will begin training the resnet model described above on the training data, output the results, save the accuracy and loss graphs in output_images folder, save the graph info for tensorboard in graph_info folder, and save the model itself in saved_model. You can expect around 70% accuracy on test set.
This script will load the model saved in best_model folder(which gave the best accuracy overall) and run it on the test set and output the results.
This script will open a gui view for you to load an image and classify it using the model in best_model folder.
- Try different architectures
- Try hierarchical softmax since the labels of CIFAR come in 2 categories (soft label, hard label)
- Python 3.6.1
- Tensorflow 1.10
- imgaug 0.2.8
- opencv-python 4.0.0