/aerial-view-segmentation

Binary and Multi-class Image Segmentation of High Resolution Aerial Drone Images

Primary LanguageJupyter NotebookMIT LicenseMIT

Drone Images Semantic Segmentation

pythonPyTorchPyLightningKaggleColabVSCode

Friendly Reminder

Your support will be truly appreciated and feel free to contact me at my following links or just send me an email:

Repository Structure

The repository is structured as follows:

  • code folder: contains the notebook for image preprocessing, Binary segmentation and Multi-class segmentation
  • plots folder: contains two subfolders binary and multiclass with the respective plots

Dataset

The dataset used is called Semantic Segmentation Drone Dataset and can be downloaded already processed at the following link.

From the original dataset the images were processed in such a way as to reduce the resolution and rename the labels to perform both Binary and Multi-class Classification; in the second case instead of using the original 24 classes they were grouped into 5 macro-classes as follows:

binary_classes = {
	0: {0, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, # obstacles
	1: {1, 2, 3, 4, 9} # landing zones
}

grouped_classes = {
         0: {0, 6, 10, 11, 12, 13, 14, 21, 22, 23}, # obstacles
         1: {5, 7}, # water
         2: {2, 3, 8, 19, 20}, # soft-surfaces
         3: {15, 16, 17, 18}, # moving objects
         4: {1, 4, 9} # landable
}
Image Binary 5-classes
594 594 594

Models and Training

The performance of 3 different Image segmentation models, each with its own particular characteristic, considered the state of the art were compared just to go to show how the different underlying concepts differed.

The models all had as their backbone an efficient-b0 pretrained on imagenet, while the decoders were trained for 25 epochs on the augmented train set. Given the limited number of images (just 400) augmentation was crucial in order to train better the models.

The criterion used for the backpropagation was the Dice Loss (Binary and Multi-class) and the model was evaluated with Recall, False Positive Rate and image-wise IoU (in the Multi-class case all the metrics beside IoU were computed per-class).

Model Charachteristic Paper
U-Net Fully Convolutional paper
DeepLabV3 Dilated Convolutions paper
MAnet Attention Mechanism paper

Binary Segmentation

We leave here some mask predictions and results from the binary segmentation task.

Images Groundtruth U-Net DeepLabV3 MAnet
Models Recall FPR IoU
U-Net 0.971 0.222 0.923
DeepLabV3 0.971 0.251 0.919
MAnet 0.973 0.249 0.918

Multi-class Segmentation

This are the results for the 5-class segmentation:

Images Groundtruth U-Net DeepLabV3 MAnet
U-Net Obstacles Water Nature Moving Landing
Recall 0.67 0.96 0.882 0.657 0.955
FPR 0.022 0.001 0.029 0.002 0.123
IoU 0.518 0.903 0.843 0.581 0.842
DeepLabV3 Obstacles Water Nature Moving Landing
Recall 0.633 0.955 0.905 0.672 0.94
FPR 0.022 0.001 0.062 0.004 0.107
IoU 0.503 0.883 0.814 0.563 0.896
MAnet Obstacles Water Nature Moving Landing
Recall 0.492 0.921 0.891 0.682 0.95
FPR 0.012 0.001 0.048 0.004 0.162
IoU 0.431 0.83 0.82 0.566 0.825