Drone Images Semantic Segmentation
Friendly Reminder
- If you use my Dataset please cite it in your work/repository with the following link: Semantic Segmentation Drone Dataset
- If you use or take inspiration from this repository please cite with this link: santurini/Drone-Images-Semantic-Segmentation
Your support will be truly appreciated and feel free to contact me at my following links or just send me an email:
Repository Structure
The repository is structured as follows:
- code folder: contains the notebook for image preprocessing, Binary segmentation and Multi-class segmentation
- plots folder: contains two subfolders binary and multiclass with the respective plots
Dataset
The dataset used is called Semantic Segmentation Drone Dataset and can be downloaded already processed at the following link.
From the original dataset the images were processed in such a way as to reduce the resolution and rename the labels to perform both Binary and Multi-class Classification; in the second case instead of using the original 24 classes they were grouped into 5 macro-classes as follows:
binary_classes = {
0: {0, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}, # obstacles
1: {1, 2, 3, 4, 9} # landing zones
}
grouped_classes = {
0: {0, 6, 10, 11, 12, 13, 14, 21, 22, 23}, # obstacles
1: {5, 7}, # water
2: {2, 3, 8, 19, 20}, # soft-surfaces
3: {15, 16, 17, 18}, # moving objects
4: {1, 4, 9} # landable
}
Image | Binary | 5-classes |
---|---|---|
Models and Training
The performance of 3 different Image segmentation models, each with its own particular characteristic, considered the state of the art were compared just to go to show how the different underlying concepts differed.
The models all had as their backbone an efficient-b0 pretrained on imagenet, while the decoders were trained for 25 epochs on the augmented train set. Given the limited number of images (just 400) augmentation was crucial in order to train better the models.
The criterion used for the backpropagation was the Dice Loss (Binary and Multi-class) and the model was evaluated with Recall, False Positive Rate and image-wise IoU (in the Multi-class case all the metrics beside IoU were computed per-class).
Model | Charachteristic | Paper |
---|---|---|
U-Net | Fully Convolutional | paper |
DeepLabV3 | Dilated Convolutions | paper |
MAnet | Attention Mechanism | paper |
Binary Segmentation
We leave here some mask predictions and results from the binary segmentation task.
Images | Groundtruth | U-Net | DeepLabV3 | MAnet |
---|---|---|---|---|
Models | Recall | FPR | IoU |
---|---|---|---|
U-Net | 0.971 | 0.222 | 0.923 |
DeepLabV3 | 0.971 | 0.251 | 0.919 |
MAnet | 0.973 | 0.249 | 0.918 |
Multi-class Segmentation
This are the results for the 5-class segmentation:
Images | Groundtruth | U-Net | DeepLabV3 | MAnet |
---|---|---|---|---|
U-Net | Obstacles | Water | Nature | Moving | Landing |
---|---|---|---|---|---|
Recall | 0.67 | 0.96 | 0.882 | 0.657 | 0.955 |
FPR | 0.022 | 0.001 | 0.029 | 0.002 | 0.123 |
IoU | 0.518 | 0.903 | 0.843 | 0.581 | 0.842 |
DeepLabV3 | Obstacles | Water | Nature | Moving | Landing |
---|---|---|---|---|---|
Recall | 0.633 | 0.955 | 0.905 | 0.672 | 0.94 |
FPR | 0.022 | 0.001 | 0.062 | 0.004 | 0.107 |
IoU | 0.503 | 0.883 | 0.814 | 0.563 | 0.896 |
MAnet | Obstacles | Water | Nature | Moving | Landing |
---|---|---|---|---|---|
Recall | 0.492 | 0.921 | 0.891 | 0.682 | 0.95 |
FPR | 0.012 | 0.001 | 0.048 | 0.004 | 0.162 |
IoU | 0.431 | 0.83 | 0.82 | 0.566 | 0.825 |