/Classifying-CIFAR10-Dataset-Using-Neural-Networks

In this repository, I am working on classifying the CIFAR10 dataset using neural networks. To achieve this, I initially perform classification using an MLP network. Then, to enhance the network's performance and reduce learning time, I employ convolutional layers, pooling, batch normalization, and dropout layers.

Primary LanguageJupyter Notebook

Classifying-CIFAR10-Dataset-Using-Neural-Networks

In this repository, I am working on classifying the CIFAR10 dataset using neural networks. To achieve this, I initially perform classification using an MLP network. Then, to enhance the network's performance and reduce learning time, I employ convolutional layers, pooling, batch normalization, and dropout layers.

First, we enter the data using the provided instructions in the following format.

from keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

Then, the training data is divided into two sections: train and validation. We pay attention to the fact that, considering batch sizes of 32, 64, or 256, the training data should be a multiple of these three sizes. For this purpose, out of 50,000 training data points, approximately 90%, equivalent to 45,056 data points, are allocated to the train set, and the rest are assigned to validation.

Of course, we could choose not to do this and allocate exactly 90% of the data to the train set, but for two reasons, we attempt the allocation as described above:

1- Training is performed on all data points.

2- If the number of training data points is not a multiple of batch sizes, some data points are not used for training, which might affect the accuracy of the model.

Additionally, we use the to_categorical() function from the Keras library to convert labels into one-hot encoded format.

Finally, as shown in the image below, we display the first 10 images of the train dataset.

image

Now we want to design a network using the Keras library to classify data. In the first part, we design a network with two hidden layers, and in the second part, we add convolutional layers to the designed network to enhance its performance.

Part A: Using the MLP Network

In this section, we first design a network with two hidden layers. Through trial and error, we conclude that setting the lengths of the hidden layers to 200 and 100 yields favorable results (the parameters influencing the choice of these lengths are accuracy and learning speed). To implement the network, a function called MLP() has been written, to which we provide the variable parameters of the problem as inputs.

Problem 1. Choosing the Most Suitable Batch Size

In this question, we consider the other parameters as follows:

  • Batch-Size: {32, 64, 256}
  • Activation Functions: {Layer #1: 'ReLU', Layer #2: 'ReLU'}
  • Optimizer: SGD (learning-rate = 0.01, momentum = 0.9)
  • Loss Function: Categorical Cross Entropy

(We consider 20 epochs in this section.)

The images below depict the desired outputs for this question. As observed, for a batch length of 256, the machine's accuracy and learning speed are improved.

batch-size = 32 :

batch-size = 64 :

batch-size = 256 :

Problem 2. Choosing the most suitable activation functions for the hidden layers

In this question, we consider the following parameters:

  • Batch-Size = 256
  • Activation Functions: {Layer #1: 'ReLU', Layer #2: 'ReLU'}, {Layer #1: 'tanh', Layer #2: 'tanh'}, {Layer #1: 'ReLU', Layer #2: 'sigmoid'}
  • Optimizer: SGD (learning-rate = 0.01, momentum = 0.9)
  • Loss Function: Categorical Cross Entropy

Figures below depict the desired output results for this question. Additionally, the first configuration (ReLU, ReLU) was examined in question 1. As observed, the performance of machine learning is better for the first configuration (ReLU, ReLU).

activation functions = {tanh, tanh}

activation functions = {RelU, sigmoid}

Problem 3. Choosing the Most Appropriate Error Function

In this question, we consider the following parameters:

  • Batch Size: 256
  • Activation Functions: {Layer #1: 'ReLU', Layer #2: 'ReLU'}
  • Optimizer: SGD (Learning Rate = 0.01, Momentum = 0.9)
  • Loss Function: {Categorical Cross-Entropy, Poisson}

The images below illustrate the desired outputs for this question. Furthermore, the first function, Categorical Cross-Entropy, was examined in question 1. As observed, the machine learning performance is better for the Categorical Cross-Entropy error function.

Problem 4. Choosing the most suitable optimizer

In this question, we consider the following parameters:

  • Batch-Size = 256
  • Activation Functions: {Layer #1: 'ReLU', Layer #2: 'ReLU'}
  • Optimizer: SGD (learning-rate = 0.01, momentum = 0.9) , Adam (learning-rate = 0.01)
  • Loss Function: {Categorical Cross Entropy, Poisson}

Figures below depict the desired outputs for this question. Furthermore, the first optimizer, SGD, was examined in question 1. As observed, the machine learning performance is better for the SGD optimizer.

Selecting the most suitable parameters for the network Based on the obtained results, the optimal parameters that lead to better accuracy and speed in the network's learning performance will be as follows:

  • Batch-Size = 256
  • Activation Functions: {Layer #1: 'ReLU', Layer #2: 'ReLU'}
  • Optimizer: SGD (learning-rate = 0.01, momentum = 0.9)
  • Loss Function: Categorical Cross Entropy, Poisson

The following image summarizes the network's layers with the aforementioned parameters.

image

Part 2: Using MLP+CNN Network

Problem 1. The Impact of Adding Convolutional Layers

Now, we want to add two convolutional layers to the best-designed convolutional network from the previous section, as shown in question 5, part A. We will examine the effect of adding these layers on the network's accuracy. Considering that the network becomes very slow in this section, we set the number of epochs to 10. (The function written for this part is denoted as CNN().)

The image below depicts the error and accuracy graph along with the compared parameter values and a summary of the network's layers. As observed, the accuracy has significantly increased, and the error has decreased. However, the noteworthy point here is that the network operates very slowly. In the following questions, we aim to implement techniques that maintain the network's accuracy despite the increased speed.

Problem 2. The Impact of Adding Pooling and Batch Normalization Layers

In this question, we first provide an explanation about these two layers and then examine their impact on the network.

Pooling Layer: In neural networks, after adding convolutional layers and as the dimensions and sizes increase in the input to the hidden layers, a Pooling layer is employed to reduce the size and shape of the output from the convolutional layer. Figure below illustrates the functioning of this layer. Incorporating this layer, despite potentially not significantly affecting accuracy, greatly reduces complexity and accelerates the network's speed.

Batch Normalization Layer: The Batch Normalization layer is used in neural networks to accelerate the learning process. It allows us to use a higher learning rate during optimization. This layer employs a normalization technique where, instead of normalizing the entire dataset, we normalize the data within each mini-batch.

The following image illustrates the error and accuracy graphs along with the compared parameter values, along with a summary of the network layers after adding the Pooling and Batch Normalization layers. As observed, the learning speed of the network has significantly increased. Furthermore, both accuracy and error metrics have not only remained stable but have also shown improvement. A notable point in this question, as well as the previous one, is that after a certain epoch, the model starts to experience overfitting, leading to a decrease in its accuracy.

Problem 3. The Impact of Adding Dropout Layers

Dropout Layer: In neural networks, some neurons may have a negative impact on the continuation of the network, and it's necessary to deactivate them in subsequent layers during the Feed-Forward process. To achieve this, the dropout layer is utilized. In this layer, a percentage of output neurons from the previous layer are made ineffective in the following layer. This action will not only increase the network's speed but also have an effect on its accuracy. The image below illustrates an example of how this layer functions. One of the most important characteristics of this layer is its ability to prevent Overfitting.

image

The following image depicts the error and accuracy graph along with the compared parameters, along with a summary of the network layers after the addition of Dropout layers. As observed, the network's learning speed has significantly increased, and the accuracy has not decreased. Additionally, during network testing iterations, it can be seen that its accuracy remains stable, making it more resistant to overfitting.

Problem 4. Early Stop

In neural networks, after a few epochs, it is observed that the validation (or test) data error curve deviates from the training data error curve and starts to increase inversely. This indicates that the network's performance on the test data is deteriorating, and therefore, it is necessary to stop the network's learning process. The solution to this problem is Early Stop. The following illustration indicates the point at which we need to apply early stopping in the network's learning process.

Criteria used in early stopping: We employ two criteria to detect when to halt learning in a network. One involves monitoring the gap between the accuracy graphs of evaluation data and training data, while the other entails observing the gap between the error graphs of these two data sets. Furthermore, it is important to determine the extent of sensitivity to the gaps in these two graphs. Using the EarlyStopping() function within the keras library, these criteria can be established.

The image below illustrates the output parameters following the implementation of early stopping on the neural network. As observed, at epoch 19, after considering the gap in the validation data error graph (determined by the patience parameter), this ascending graph ceases to rise, thus concluding the learning process. Overall, even if a greater number of epochs were considered, should the increment continue, the network's learning would halt.