Drone Images Semantic Segmentation

Friendly Reminder

If you use my Dataset please cite it in your work/repository with the following link: Semantic Segmentation Drone Dataset
If you use or take inspiration from this repository please cite with this link: santurini/Drone-Images-Semantic-Segmentation

Your support will be truly appreciated and feel free to contact me at my following links or just send me an email:

Repository Structure

The repository is structured as follows:

code folder: contains the notebook for image preprocessing, Binary segmentation and Multi-class segmentation
plots folder: contains two subfolders binary and multiclass with the respective plots

Dataset

The dataset used is called Semantic Segmentation Drone Dataset and can be downloaded already processed at the following link.

From the original dataset the images were processed in such a way as to reduce the resolution and rename the labels to perform both Binary and Multi-class Classification; in the second case instead of using the original 24 classes they were grouped into 5 macro-classes as follows:

binary_classes = {
	0: {0, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, # obstacles
	1: {1, 2, 3, 4, 9} # landing zones
}

grouped_classes = {
         0: {0, 6, 10, 11, 12, 13, 14, 21, 22, 23}, # obstacles
         1: {5, 7}, # water
         2: {2, 3, 8, 19, 20}, # soft-surfaces
         3: {15, 16, 17, 18}, # moving objects
         4: {1, 4, 9} # landable
}

Image	Binary	5-classes

Models and Training

The performance of 3 different Image segmentation models, each with its own particular characteristic, considered the state of the art were compared just to go to show how the different underlying concepts differed.

The models all had as their backbone an efficient-b0 pretrained on imagenet, while the decoders were trained for 25 epochs on the augmented train set. Given the limited number of images (just 400) augmentation was crucial in order to train better the models.

The criterion used for the backpropagation was the Dice Loss (Binary and Multi-class) and the model was evaluated with Recall, False Positive Rate and image-wise IoU (in the Multi-class case all the metrics beside IoU were computed per-class).

Model	Charachteristic	Paper
U-Net	Fully Convolutional	paper
DeepLabV3	Dilated Convolutions	paper
MAnet	Attention Mechanism	paper

Binary Segmentation

We leave here some mask predictions and results from the binary segmentation task.

Images	Groundtruth	U-Net	DeepLabV3	MAnet

Models	Recall	FPR	IoU
U-Net	0.971	0.222	0.923
DeepLabV3	0.971	0.251	0.919
MAnet	0.973	0.249	0.918

Multi-class Segmentation

This are the results for the 5-class segmentation:

Images	Groundtruth	U-Net	DeepLabV3	MAnet

U-Net	Obstacles	Water	Nature	Moving	Landing
Recall	0.67	0.96	0.882	0.657	0.955
FPR	0.022	0.001	0.029	0.002	0.123
IoU	0.518	0.903	0.843	0.581	0.842

DeepLabV3	Obstacles	Water	Nature	Moving	Landing
Recall	0.633	0.955	0.905	0.672	0.94
FPR	0.022	0.001	0.062	0.004	0.107
IoU	0.503	0.883	0.814	0.563	0.896

MAnet	Obstacles	Water	Nature	Moving	Landing
Recall	0.492	0.921	0.891	0.682	0.95
FPR	0.012	0.001	0.048	0.004	0.162
IoU	0.431	0.83	0.82	0.566	0.825

santurini/aerial-view-segmentation