Please visit this repository from Udacity.
-
The shape of the images is (32, 32, 3).
-
The data set for training has 34799 images.
-
The data set for validation has 4410 images.
-
The data set for the test has 12630 images.
-
The number of classes is 43.
The training data can be found on this website. [link]
Some examples of the training data is shown below:
Here is an exploratory visualization of the data set. It is a bar chart showing how is the data distribution. X-axis
is the labels, and y-axis
is the number of images of every label.
-
Firstly the images were cropped from size 32x32 to 28x28. I think that this process would reduce the influence of the background so that the neural network could focus more on the patterns in the middle of the images.
-
Secondly the images were stretched according to their lightness. I first converted the images from RGB format to HLS format; then I stretched the L channel to enhance the contrast. In the end, I changed the images from HLS back to RGB, so that the color information was preserved.
-
Finally I normalized the images so that the range of numbers was changed from [0, 255] to [-1, 1].
Below is an example of the preprocessing.
I randomly changed the lightness, rotation and did some shears of the images, so that the training data would be more generalized.
Below is the test of the data augmentation.
The number of training data would be much more.
My final model consisted of the following layers:
Layer | Description |
---|---|
Input | 28x28x3 RGB image |
Convolution 5x5 | 1x1 stride, valid padding, outputs 24x24x16 |
RELU | |
Max pooling | 2x2 stride, outputs 12x12x16 |
Convolution 5x5 | 1x1 stride, valid padding, outputs 8x8x32 |
RELU | |
Max pooling | 2x2 stride, outputs 4x4x32 |
Convolution 5x5 | 1x1 stride, SAME padding, outputs 4x4x64 |
RELU | |
Max pooling | 2x2 stride, outputs 2x2x64 |
Flatten | 256 |
Dropout | |
Fully Connected | 128 |
RELU | |
Dropout | |
Fully Connected | 84 |
RELU | |
Dropout | |
Logits | 43 |
I used AdamOptimizer with learning rate 0.001, Batch size 128, keep percentage 50% for the dropout layer. I trained and retrained the model altogether for 20 epochs.
The accuracy of training and validation is shown below.
My final model results were:
-
training set accuracy of 99.3%
-
validation set accuracy of 98.5%
-
test set accuracy of 96.5%
The approaches
-
Tried to overfit the training data using more complex neural network
-
Tried to use techniques to reduce the overfitting. These include:
-
Augmented the training data.
-
Used max pooling and dropout layers.
-
Shuffled the training data more.
-
Below is some visual exploration of the test data.
The total test accuracy was about 96.5%. And I have chosen the first 20 test images to predict. The result is shown below.
Here are some images that I found on the web:
The first three signs were easy to recognize because the neural network was trained for those patterns. And the last five patterns were false recognized because that the neural network had not been trained for those patterns.
Interesting was that, the neural network recognized the fifth picture with sure percentage about 98% as 'STOP' sign, because of the red color of the pattern and some letters in the pattern and the preprocessing (Scaling) of the image. So the computer saw them as very similar.
The result is shown below.
-
Udacity Self-Driving Car Engineer Nanodegree