Semantic Segmentation

Dependencies and Usage

Please visit the repository from Udacity.

Data Exploration and Augmentation

I used the KITTI training data to train the model, and the training data would be augmented.

The training data as matrixes have the size of (?, height, width, 3), in which the last dimension 3 means the images have three channels respectively RGB. The label images as matrixes have the size of (?, height, width, 2), in which the last dimension 2 means there are two types of objects respectively road and non-road surface.

The images above show,

The first row is the original image and the corresponding label-image. It is to mention that the white pixels in the label-image are the actual road surface. The black area is the non-road area.
The second row is the image, which is flipped horizontally in comparison with the original image.
The third row is the original image, in which the light is added by 30.
The fourth row is the flipped image, in which the light is added by 30.
The fifth row is the original image, in which the light is reduced by 30.
The sixth row is the flipped image, in which the light is reduced by 30.

Neural Network Architecture

As the diagram above shows,

The first row in this picture is a part of the VGG16 architecture.
The output of the layer 7 is connected with a 1x1 convolutional layer with the depth (the last dimension of a layer) of 2, which indicates that there are overall two types of objects to be classified, respectively road and non-road.
The output of the layer 3 and 4 would be respectively also connected with a 1x1 convolutional layer, whose depth, in other words, the last dimension, was 2.
All the three layers mentioned above were upsampled and added together. Finally, the last layer was upsampled, to form a matrix, whose width and height were the same as the original image but the depth is 2.

Train the Model

Some important parameters for training were set as following:

batch_size = 1
learning_rate = 1e-5
epoches = 60
keep_prob = 0.8

The training losses are shown below:

From the diagram above we can see that the losses are convergent good when the epoch is around 45.

Result

Some images for testing the performance of the model is shown below,

From the images above we can see that the results are satisfying among the testing data.

References

Udacity Nanodegree Self Driving Car Engineer
VGG16 Architecture
Training data KITTI and download
Other possible training data cityscapes
Jonathan Long, Evan Shelhamer, Trevor Darrell - Fully Convolutional Networks for Semantic Segmentation [FCNs] - UC Berkeley

TrW236/SemanticSegmentation