In this project a fully convolutional neural network for the purpose of semantic image segmentation is constructed on the basis of the well known VGG-16 architecture, following roughly the ideas and concepts outlined here. The final architecture is trained on the Kitti Road dataset which enables the network to perform the semantic segmentation task of identify the pixels corresponding to a road in an image.
The neural network designed to perform semantic segmentation harnesses a pre-trained VGG-16 network. As a first step, the last and fully connected layer of the VGG-16 network is replaced with a 1x1 convolutional layer. In addition, several de-convolutional layers are introduced as decoder part of the network. Skip connections are are used to improve the performance of the model.
An Adam-Optimizer with an initial learning rate of 0.001 is used to minimize the cross-entropy-loss of the network. The batch-size was set to 10 and the network was trained to 50 epochs.
After training the network for 50 epochs, the network is fairly accurate in identifying the drivable portion of the road in an image. Some sample images showcasing the performance of the network are shown below.
To run the code, make sure you have the following is installed:
Download the Kitti Road dataset from here. Extract the dataset in the data
folder. This will create the folder data_road
with all the training a test images.
Run the following command to run the project:
python main.py