Project: Build a Traffic Sign Recognition Program

Overview

In this project, you will use what you've learned about deep neural networks and convolutional neural networks to classify traffic signs. You will train and validate a model so it can classify traffic sign images using the German Traffic Sign Dataset. After the model is trained, you will then try out your model on images of German traffic signs that you find on the web.

We have included an Ipython notebook that contains further instructions and starter code. Be sure to download the Ipython notebook.

We also want you to create a detailed writeup of the project. Check out the writeup template for this project and use it as a starting point for creating your own writeup. The writeup can be either a markdown file or a pdf document.

To meet specifications, the project will require submitting three files:

the Ipython notebook with the code
the code exported as an html file
a writeup report either as a markdown or pdf file

Creating a Great Writeup

A great writeup should include the rubric points as well as your description of how you addressed each point. You should include a detailed description of the code used in each step (with line-number references and code snippets where necessary), and links to other supporting documents or external references. You should include images in your writeup to demonstrate how your code works with examples.

All that said, please be concise! We're not looking for you to write a book here, just a brief description of how you passed each rubric point, and references to the relevant code :).

You're not required to use markdown for your writeup. If you use another method please just submit a pdf of your writeup.

The Project

The goals / steps of this project are the following:

Load the data set
Explore, summarize and visualize the data set
Design, train and test a model architecture
Use the model to make predictions on new images
Analyze the softmax probabilities of the new images
Summarize the results with a written report

Dependencies

This lab requires:

CarND Term1 Starter Kit

The lab environment can be created with CarND Term1 Starter Kit. Click here for the details.

Dataset and Repository

Download the data set. The classroom has a link to the data set in the "Project Instructions" content. This is a pickled dataset in which we've already resized the images to 32x32. It contains a training, validation and test set.
Clone the project, which contains the Ipython notebook and the writeup template.

git clone https://github.com/udacity/CarND-Traffic-Sign-Classifier-Project
cd CarND-Traffic-Sign-Classifier-Project
jupyter notebook Traffic_Sign_Classifier.ipynb

Requirements for Submission

Follow the instructions in the Traffic_Sign_Classifier.ipynb notebook and write the project report using the writeup template as a guide, writeup_template.md. Submit the project code and writeup document.

Writeup for submission

Dependencies:

pickle
numpy
sklearn
tensorflow
cv2
glob

Dataset Exploration

Dataset Summary

First load 3 datasets - train, valid and test sets

X_train shape :  (34799, 32, 32, 3)
y_train shape :  (34799,)
X_valid shape :  (4410, 32, 32, 3)
y_valid shape :  (4410,)
X_test shape :  (12630, 32, 32, 3)
y_test shape :  (12630,)

Then print the summary.

Number of total examples = 51839
Number of training examples = 34799
Mean of training examples = 82.677589, Standard Deviation = 67.850888
Number of validation examples = 4410
Mean of validation examples = 83.556427, Standard Deviation = 69.887713
Number of testing examples = 12630
Mean of testing examples = 82.148460, Standard Deviation = 68.744089
Image data shape = (32, 32)
Number of classes = 43

Then plot the distribution of label sets as follows

Exploratory Visualization

Display randomly the training set data with label

Design and Test a Model Architecture

Preprocessing

Before putting training data into pipeline, data needs to be pre-processed

First step, grayscale images because the input to LeNet has to be 32x32x1.

Grayscaling can be done by averaging 3 color channel values:

X_train_gray = np.sum(X_train/3, axis=3, keepdims=True)
X_valid_gray = np.sum(X_valid/3, axis=3, keepdims=True)
X_test_gray = np.sum(X_test/3, axis=3, keepdims=True)

Then normalize the datasets to (-1,1).

The grayscaled images has values between (0, 255), now we subtract them by 128 and divide them by 128,

it can make all images all have zero mean and the same variance, which is better for training

X_train_normalized = (X_train_gray - 128)/128
X_valid_normalized = (X_valid_gray - 128)/128
X_test_normalized = (X_test_gray - 128)/128

Model Architecture

The pipeline for training a LeNet model:

Data preprocessing - grayscale and normalize loaded data
Design the model according to LeNet architecture
Selection of the adequate optimizer - AdamOptimizer is chosen as it works similarly as stochastic gradient descent
Tunning of hyperparameters - Finetuning Learning rate, Number of epochs, Batch size and Early stop threshold
Model training - design the process as follows
Model's assessment metric/benchmark - monitor the validation accuracy and check test set accuracy in the end
If overfitting and/or underfitting occurs: I tried increasing epoch number and decreasing learning rate when I ran into underfitting issue.

LeNet architecture is defined as follows:

def LeNet(x, n_classes):    
    # Arguments used for tf.truncated_normal, randomly defines variables for the weights and biases for each layer
    mu = 0 # mean
    sigma = 0.1 # standard deviation
    
    # TODO: Layer 1: Convolutional. Input = 32x32x1. Output = 28x28x6.
    stride1 = 1
    conv1_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 1, 6), mean = mu, stddev = sigma))
    conv1_b = tf.Variable(tf.zeros(6))
    conv1 = tf.nn.conv2d(x, conv1_W, strides=[1, stride1, stride1, 1], padding='VALID') + conv1_b
    print("conv1 shape : ", conv1.get_shape())

    # TODO: ReLu Activation.
    conv1 = tf.nn.relu(conv1)

    # TODO: Max Pooling. Input = 28x28x6. Output = 14x14x6.
    conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
    print("conv1 shape after max pooling : ", conv1.get_shape())

    # TODO: Layer 2: Convolutional. Output = 10x10x16.
    stride2 = 1
    conv2_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 6, 16), mean = mu, stddev = sigma))
    conv2_b = tf.Variable(tf.zeros(16))
    conv2 = tf.nn.conv2d(conv1, conv2_W, strides=[1, stride2, stride2, 1], padding='VALID') + conv2_b
    print("conv2 shape : ", conv2.get_shape())
    
    # TODO: ReLu Activation.
    conv2 = tf.nn.relu(conv2)

    # TODO: Max Pooling. Input = 10x10x16. Output = 5x5x16.
    conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
    print("conv2 shape after max pooling : ", conv2.get_shape())

    # TODO: Flatten. Input = 5x5x16. Output = 400.
    fc0 = flatten(conv2)
    print("Fully connected layer fc0 shape : ", fc0.get_shape())
    
    # TODO: Layer 3: Fully Connected. Input = 400. Output = 120.
    fc1_W = tf.Variable(tf.truncated_normal(shape=(400, 120), mean = mu, stddev = sigma))
    fc1_b = tf.Variable(tf.zeros(120))
    fc1 = tf.matmul(fc0, fc1_W) + fc1_b
    print("Fully connected layer fc1 shape : ", fc1.get_shape())
    
    # TODO: ReLu Activation.
    fc1 = tf.nn.relu(fc1)

    # TODO: Layer 4: Fully Connected. Input = 120. Output = 84.
    fc2_W = tf.Variable(tf.truncated_normal(shape=(120, 84), mean = mu, stddev = sigma))
    fc2_b = tf.Variable(tf.zeros(84))
    fc2 = tf.matmul(fc1, fc2_W) + fc2_b
    print("Fully connected layer fc2 shape : ", fc2.get_shape())
    
    # TODO: ReLu Activation.
    fc2 = tf.nn.relu(fc2)

    # TODO: Layer 5: Fully Connected. Input = 84. Output = 10.
    fc3_W = tf.Variable(tf.truncated_normal(shape=(84, n_classes), mean = mu, stddev = sigma))
    fc3_b = tf.Variable(tf.zeros(n_classes))
    logits = tf.matmul(fc2, fc3_W) + fc3_b
    print("Output logits shape : ", logits.get_shape())
    
    return logits

The shape of each layer of LeNet:

conv1 shape :  (?, 28, 28, 6)
conv1 shape after max pooling :  (?, 14, 14, 6)
conv2 shape :  (?, 10, 10, 16)
conv2 shape after max pooling :  (?, 5, 5, 16)
Fully connected layer fc0 shape :  (?, 400)
Fully connected layer fc1 shape :  (?, 120)
Fully connected layer fc2 shape :  (?, 84)
Output logits shape :  (?, 43)

Model Training

Training pipeline is designed as follows:

logits = LeNet(x, n_classes) # Feedforward
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=one_hot_y, logits=logits) # Calculate error
loss_operation = tf.reduce_mean(cross_entropy) # Average errors for all the training images
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate) # Choose the error optimizer
training_operation = optimizer.minimize(loss_operation) # minimized the error

Lastly the evaluation function needs to check the average tween predicted logits and label for each batch

correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

def evaluate(X_data, y_data):
    num_examples = len(X_data)
    total_accuracy = 0
    sess = tf.get_default_session()
    for offset in range(0, num_examples, BATCH_SIZE):
        batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
        accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y})
        total_accuracy += (accuracy * len(batch_x))
    return total_accuracy / num_examples

Solution Approach

So the hyperparameters for training:

learning_rate = 0.0005
EPOCH is 100, if validation accuracy reaches 0.95 training will stop
BATCH_SIZE = 128
AdamOptimizer is utilized as optimizer

Training result of accuracy over epoch:

Validation Accuracy at epoch 100 = 0.931

Test a Model on New Images

Acquiring New Images

I happen to live in Munich, so I took some pictures of my neighborhood and manually croppedd and resized them to 32x32.

Then grayscale and normalized new images:

Performance on New Images

Load the saved model and do the prediction on new images:

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    saver_ = tf.train.import_meta_graph('./LeNet_Model.meta')
    saver_.restore(sess, "./LeNet_Model")
    new_img_accuracy = evaluate(image_set, image_label)
    print("New Images Accuracy = {:.3f}".format(new_img_accuracy))

Prediction result:

New Images Accuracy = 1.000

Model Certainty - Softmax Probabilities

Output Top 5 Softmax Probabilities For Each Image:

Discussion

New images might be misclassified if they are in the relatively rare class in the training data, in the histogram of labels above, we can see some classes like 1, 19 or 42 that have much less occurance than others. So the model might find it hard to learn the features of those class, one solution is to get more data for those classes.

paulyehtw/CarND-Traffic-Sign-Classifier-Project