In this project, I have used the FastAI framework for performing semantic image segmentation on the CamVid dataset.
The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes. More on this dataset can be found on their official website here.
I have used fastai datasets for importing the CamVid dataset to my notebook.
Required installation for this project:
pip install http://download.pytorch.org/whl/cpu/torch-1.0.0-cp36-cp36m-linux_x86_64.whl
pip install fastai
More info on installation procedures can be found here.
A sample image data:
We also get a labelled dataset. The labelled counterpart of the above image is :
After we prepare our data with the images and their labels, a sample batch of data looks something like this:
FastAI conveniently combines the images with thier labels giving us more accurate images for our training process. Thus the above sample batch contains all the transformations, normalisations and other specifications that are provided to the data.
I have used a U-Net model, which is one of the most common architectures that are used for segmentation tasks. A U-Net architecture looks something like this:
More on this can be found here.
The final accuracy I got was a 91.6%. The following graph shows the training and validation loss:
Some sample predictions are:
The predictions are pretty close to the ground truth !