The human eye is capable of recognising, localising the features and classifying the images according to the variations present in the image at a very fast pace. In the man-made machines or systems, the capability of classifying the images according to the variations present is very less. The use of CNN in these systems help in increasing the classification accuracy as it creates a feature map for the features present within an image and predicts the label of the image using the minute-detailed as well as general features.
- Dataset
-
Data Preprcoessing
As the images are of different pixel ratio; in order to have easy in computation as well as to feed in the same pixel images to the model so that none of the features remains left out, the image pixels is made same. This is done by normalising the image where in every image is divided by 255, to keep the image pixels between 0 to 255. In the dataset, the image size is of about 180 X 180 due to which the system experiences much load, so the image size is reduced to a power of 2, here 64 X 64. But still due to heavy load on system, system crashes and so the size is reduced to power of 2, here 32 X 32 so that it can fit the model well.This size of image is further used. -
Data Augmentation
For a single image, augmentation in form of:- Flipping the image left to right or vice versa
- Adjusting the image brightness to 0.4
- Adjusting the image brightness to 0.2
- Cropping the image to 0.5
- Rotating the image several times by 90 degree
After these many types of augmentation, for every image about 7 new images are generated. In total around 50,000 new images are generated and the dataset is expanded.
- Flipping the image left to right or vice versa
-
Model Architecture
Architecture | Model Summary |
- Kernel size have been taken in order of 3, 5 and 3 in order to fetch the local features first, then the global features and then the remaining features so none of the features left untouched.
- ReLU is used because as compared to tanh, sigmoid functions as it provides more generalisation accuracy.
- Number of kernels have been taken in order of 64, 128, 64 as in some research work there is description that generally the number of filters are increased in first to maintain the spatial dimensions and then decreased to capture the important minute features.
- Batch Normalisation is used so that in less number of epochs it provides more acceleration to training process.
- More number of dropouts in the CNN layers increase the chances of making the learning process slow as more number of neurons freezes. So, we have used dropout only in the final layer with minimal value so that neither it makes the learning slow nor leads to overfitting.
- The Flatten layer is used to convert the 2D matrix into 1D vector form.
- The dense layer after the flatten layer is used to connect the vector column to the neuron for final classification.
- For final classification, mainly softmax activation function is used as it provides us with the probability that an input image belongs to a certain class with certain probability.
- The final dense layer shape is of 200 such that it can predict the label of the input image from 200 categories available in the dataset.
-
Accuracy
The accuracy of the training dataset is about 99% while the accuracy of the testing dataset is about 55%. This accuracy is due to large number of classes(200) available for classification due to which it leads to high chances of misclassification which in turn which less accurate results. -
Entropy Loss Graph
- Graphs
Learning Rate | Momentum |
- Google Colab
- To clone this repository
git clone https://github.com/DipikaPawar12/CV_Assignment4-5_Aanshi_Dipika.git
- To install the requirements
pip install -r requirements.txt
- To install the dataset
pip install -q tfds-nightly tensorflow matplotlib
[1] Dataset
[2] Data Augmentation
[3] The Complete Beginner's Guide to Deep Learning: Convolutional Neural Networks
[4]
F. Sultana, A. Sufian, and P. Dutta,
Advancements in Image Classification using Convolutional Neural Network,
2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN).
[5]
Dan C. Cires ̧an, Ueli Meier, Jonathan Masci, Luca M. Gambardella, and Jurgen Schmidhuber,
Flexible, High Performance Convolutional Neural Networks for Image Classification,
International Joint Conference on Artificial Intelligence.
[6]
C.-C. Jay Kuo,
Understanding Convolutional Neural Networks with A Mathematical Model.
| Dipika Pawar | Aanshi Patwari |