- Mikel Menta Grade - mikel.menta@e-campus.uab.cat
- Alex Vallès Fernández - alex.valles@e-campus.uab.cat
- Sebastian Maya Hernández - sebastiancamilo.maya@e-campus.uab.cat
- Pedro Luis Trigueros Mondéjar - pedroluis.trigueros@e-campus.uab.cat
In this project we focus on scene understanding for autonomous vehicles. Understanding the context of the own vehicle is key for autonomous driving. The project consists of three parts or stages, corresponding to object detection, recognition/classification and semantic segmentation.
Furthermore, we aim to learn the basic concepts, techniques, tricks and libraries to develop and evaluate deep neural networks.
- Very Deep Convolutional Networks for Large-Scale Image Recognition (VGG)
- Squeeze and Excitation Blocks
- You Only Look Once (YOLO)
- Focal Loss for Dense Object Detection
- Fully Convolutional Networks for Semantic Segmentation
- Wider or Deeper: Revisiting the ResNet Model for Visual Recognition
The weights, experiments' info and the TensorBoard logs are available here.
python train.py -c config/configFile.py -e expName
Overview: We have implemented the winner architecture of ILSVRC 2017 Classification Competition, the Squeeze-Excitation ResNet. The squeeze-excitation block consists on an squeeze step based on a Global Average Pooling over the output of the residual block and afterwards, an excitation step based on obtaining some weights for each output channel of the residual block and multiplying the channels by those weights. To obtain those weights, two fully-connected layers are used. The first one performs a dimensionality reduction over the number of channels C and uses ReLU activation. The reduction has been performed with a reduction ratio of r=16. The second FC layer recovers the original dimensionality C and uses sigmoid activation for obtaining a weight in range [0,1].
Datasets:
Contributions to the code:
code/models/se_resnet50.py
- Squeeze and Excitation ResNet implementation.code/scripts/dataset_analysis.py
- Script for analysing the datasets.code/config/*
- Configuration files for image classification
Completeness of the goals:
- a) Run the provided code.
- Analyze the dataset.
- Calculate the accuracy on train and test sets.
- Evaluate different techniques in the configuration file.
- Transfer learning to another dataset (BTS).
- Understand which parts of the code are doing what you specify in the configuration file. - b) Train a network on another dataset.
- c) Implement a new network (c.2 - Develop the network entirely by yourself).
- e) Report showing the achieved results.
Results:
Neuronal Network | Dataset | Accuracy training | Accuracy test |
---|---|---|---|
VGG | TT100K | 0.9664 | 0.9546 |
VGG | BelgiumTSC | 0.9875 | 0.9607 |
VGG | KITTI | 0.7950 | 0.7805 |
Squeeze-Excitation ResNet | TT100K | 0.9987 | 0.9619 |
Squeeze-Excitation ResNet | BelgiumTSC | 0.9978 | 0.9655 |
VGG with Crop (224,224) | TT100K | 0.9513 | 0.9226 |
VGG pretrained on ImageNet | TT100K | 0.6610 | 0.7859 |
VGG pretrained on TT100K | BelgiumTSC | 0.9663 | 0.9497 |
Overview:
We used for object detection the network YOLO and we implemented the RetinaNet based on the paper Focal Loss for Dense Object Detection.
YOLO or You Only Look Once consists on dividing the input image into an S x S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes, reflecting how confidence and accurate the model is to that prediction.
On the other hand, the RetinaNet is a one stage detector network that matches the state-of-the-art results of two-stage detectors.The authors of this network have identified that the cause of obtaining lower accuracies comes from class imbalance in the datasets. So, they propose a dynamically scaled cross-entropy loss that down-weights the correctly classified examples and helps focusing on difficult miss-classified samples.
Datasets:
Contributions to the code:
code/models/retinanet.py
- RetinaNet implementation.code/metrics/retina_metrics.py
- Focal Loss and metrics implementations for RetinaNet.code/scripts/dataset_analysis.py
- Script for analysing the datasets.code/config/*
- Configuration files for image detection
Completeness of the goals:
-
a) Run the provided code.
- Analyze the dataset.
- Compute and compare the detection F-score on the train, validation and test parts separately.
- Analyze the dataset.
-
b) Read two papers cited in the object detection networks section.
- YOLO.
- Another paper (RetinaNet).
- YOLO.
-
c) Implement a new network (c.2 - Develop the network entirely by yourself).
-
d) Train the networks for another dataset (Udacity).
-
f) Report showing the achieved results.
Results:
Neuronal Network | Dataset | Overall Precision | Overall Recall | Overall F1 | Average Recall | Average IoU | FPS |
---|---|---|---|---|---|---|---|
YOLO v2 | TT100K | 0.6308 | 0.3253 | 0.4292 | 0.9730 | 0.7588 | 70.11 |
YOLO v2 | Udacity | 0.1701 | 0.1606 | 0.1652 | 0.5660 | 0.5213 | 68.33 |
YOLO v2 | Udacity (Data Aug,40 epochs) | 0.1779 | 0.1911 | 0.1843 | 0.5831 | 0.5212 | 67.66 |
RetinaNet | TT100K | 0.5930 | 0.0547 | 0.1205 | 0.9621 | 0.7405 | 69.22 |
RetinaNet | Udacity | 0.1216 | 0.0978 | 0.1147 | 0.1580 | 0.3424 | 66.54 |
Overview:
The goal of the image semantic segmentation is to label each pixel of the input image with the class that belongs. In order to do that task we used Wide - Resnet and FCN8 nets.
Datasets:
Contributions to the code:
code/models/wide_resnet.py
- ResnetFCN implementation.code/config/*
- Configuration files for semantic segmentation
Completeness of the goals:
- a) Run the provided code.
- b) Read two papers.
- Summary of “Fully Convolutional Networks for Semantic Segmentation”.
- Another paper (ResnetFCN).
- Summary of “Fully Convolutional Networks for Semantic Segmentation”.
- c) Implement a new network (c.2 - Develop the network entirely by yourself).
- d) Train the networks for another dataset (KITTI).
- f) Report showing the achieved results.
Results:
Neuronal Network | Dataset | Accuracy | Loss | Jaccard Coefficient | FPS |
---|---|---|---|---|---|
FCN8 | CamVid | 0.9226 | 0.2136 | 0.6561 | 20.51 |
FCN8 | KITTI | 0.8298 | 0.7721 | 0.4284 | 17.85 |
ResnetFCN | CamVid | 0.9168 | 0.3357 | 0.6016 | 16.04 |
ResnetFCN | KITTI | 0.7936 | 1.0637 | 0.3642 | 14.85 |