Semantic segmentation refers to the process of linking each pixel in an image to a class label. It can also be referred to as image classification for pixels in an image. Its primary applications include autonomous vehicles, human-computer interaction, robotics and photo-editing tools. It is very useful for self-driving cars in which contextual information of the environment is required at each and every step while traversing the route.
In this project, we aim to perform semantic segmentation on CamVid dataset and evaluate the deep learning model by using some metrics.
For this project, I am going to use UNet architecture, which was developed by Olaf Ronneberger et al. for Bio Medical Image Segmentation. The architecture of this model is similar to that of an autoencoder. The encoding path consists of a series of convolutions, also known as the down-sampling path. The decoding path consists of upconvolutions (transposed convolutions are used in this case). It's also called the up-sampling path. The architecture of UNet is fully convolutional, wherein we are using the Cross Entropy Loss.
The Cambridge-driving Labeled Video Database (CamVid) is the first collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes.
We have used PyTorch as the deep learning framework for this task.
- Data-preprocessing
- Data-loading (using custom dataloaders)
- Developing the deep learning architecture
- Training the model
- Evaluating the model:
- Pixel Accuracy
- Intersection over Union
- Testing for a particular sample image