This project was developed as a Computer Vision assignment in our last college year. Our professors provided us a dataset of 50 OCT images with their respective segmentation masks to detect pathological fluid from images of the retina. In the description of this assignment, they suggested us using U-Net architecture as our baseline and reviewing the SoTA in semantic segmentation to improve its results. We were expected to familiarize with the most used Deep Learning models of the day and experiment with diverse alternatives and techniques to research the performance of modern CV proposals in this problem. Then, we were asked to submit a paper with a detailed description of the tested approaches and the obtained results (check it here).
- Baseline: U-Net architecture (implementation was adapted from pytorch-unet).
- Fully convolutional pretrained models: U-Net, LinkNet and PSPNet using ResNet-50 as a pretrained encoder with ImageNet (implementation adapted from segmentation_models_pytorch).
- Transformer-based models: PAN with pretrained ResNet-50 as encoder (implementation adapted from segmentation_models_pytorch) and Attention U-Net (implementation adapted from CBIM-Medical-Image-Segmentation).
- Deformable convolutions: U-Net with deformable convolutions from PyTorch-Deformable-Convolution-v2.
- Adversarial learning: We took fully convolutional pretrained models and trained them in an adversarial benchmark (see details in the paper).
F-Score results:
Default benchmark | Adversarial benchmark | |||
No aug. | Data aug. | No aug. | Data aug. | |
Baseline U-Net | 0.60.15 | 0.650.16 | ||
U-Net ® | 0.70.05 | 0.60.09 | 0.870.03 | 0.860.06 |
LinkNet ® | 0.60.1 | 0.570.09 | 0.770.25 | 0.830.12 |
PSPNet ® | 0.670.04 | 0.670.05 | 0.820.03 | 0.840.04 |
PAN ® | 0.70.05 | 0.660.09 | ||
Attention U-Net | 0.740.06 | 0.810.08 | ||
Deform U-Net | 0.710.06 | 0.730.09 |
Intersection over Union (IoU) results:
Default benchmark | Adversarial benchmark | |||
No aug. | Data aug. | No aug. | Data aug. | |
Baseline U-Net | 0.440.13 | 0.560.11 | ||
U-Net ® | 0.540.06 | 0.480.1 | 0.780.04 | 0.760.08 |
LinkNet ® | 0.430.11 | 0.40.09 | 0.670.22 | 0.720.14 |
PSPNet ® | 0.50.04 | 0.510.06 | 0.690.04 | 0.730.05 |
PAN ® | 0.530.05 | 0.50.03 | ||
Attention U-Net | 0.590.07 | 0.720.08 | ||
Deform U-Net | 0.550.06 | 0.580.07 |
Pretrained models are marked with ®.
It is possible to check the number of parameters of the models:
python3 counter.py
All code needed to reproduce our results is available in segmenter/
folder (if you require access to the original dataset, please contact me). The script test.py
admits the following parameters to select the segmentation model:
python3 test.py <model> <mode> -v -aug -adv
model
: Specifies the model to train or test (choices arebase
,unet
,linknet
,pspnet
,pan
,attnunet
,deformunet
).mode
: Specifies the execution mode (kfold
ortrain
).kfold
accepts argument--k
to runk
fold cross validation. Final metrics are stored in the folder--model-path
.train
executes a single train-validation split and the training procedure of the model selected and validating with the 10% of the dataset.
-v
: Flag to show the training trace.-aug
: Flag to use data augmentation.-adv
: Flag to train the model in the adversarial benchmark.
Examples:
Execute the baseline model without data augmentation with batch size of 10 images and save results in ../results/base
.
python3 test.py base kfold -v --batch_size=10 --model_path=../results/ base/ --route=../OCT-dataset/
Execute pretrained U-Net with data augmentation:
python3 test.py unet kfold -v --batch_size=10 -aug
Execute LinkNet with data augmentation and adversarial learning:
python3 test.py unet train -v --batch_size=10 -aug -adv
Note that the argument --route
specifies the path where the OCT dataset is stored. The structure of this folder is expected to store two subfolders: images/
and masks/
, which contain respectively the tomography retinal images used as input to the model and the pathological fluid masks used as output.
- Ana Ezquerro (ana.ezquerro@udc.es, GitHub).