Scene Understanding for Autonomous Vehicles

Master in Computer Vision - M5 Visual recognition

Team "Avengers"

Mikel Menta Grade - mikel.menta@e-campus.uab.cat
Alex Vallès Fernández - alex.valles@e-campus.uab.cat
Sebastian Maya Hernández - sebastiancamilo.maya@e-campus.uab.cat
Pedro Luis Trigueros Mondéjar - pedroluis.trigueros@e-campus.uab.cat

Project overview

In this project we focus on scene understanding for autonomous vehicles. Understanding the context of the own vehicle is key for autonomous driving. The project consists of three parts or stages, corresponding to object detection, recognition/classification and semantic segmentation.
Furthermore, we aim to learn the basic concepts, techniques, tricks and libraries to develop and evaluate deep neural networks.

Documentation

Paper Summaries

Weights of the models

The weights, experiments' info and the TensorBoard logs are available here.

Instructions for use code

python train.py -c config/configFile.py -e expName

Object Recognition (Week 2)

Overview: We have implemented the winner architecture of ILSVRC 2017 Classification Competition, the Squeeze-Excitation ResNet. The squeeze-excitation block consists on an squeeze step based on a Global Average Pooling over the output of the residual block and afterwards, an excitation step based on obtaining some weights for each output channel of the residual block and multiplying the channels by those weights. To obtain those weights, two fully-connected layers are used. The first one performs a dimensionality reduction over the number of channels C and uses ReLU activation. The reduction has been performed with a reduction ratio of r=16. The second FC layer recovers the original dimensionality C and uses sigmoid activation for obtaining a weight in range [0,1].

Datasets:

TT100K
BelgiumTSC
KITTI

Contributions to the code:

code/models/se_resnet50.py - Squeeze and Excitation ResNet implementation.
code/scripts/dataset_analysis.py - Script for analysing the datasets.
code/config/* - Configuration files for image classification

Completeness of the goals:

a) Run the provided code.
- Analyze the dataset.
- Calculate the accuracy on train and test sets.
- Evaluate different techniques in the configuration file.
- Transfer learning to another dataset (BTS).
- Understand which parts of the code are doing what you specify in the configuration file.
b) Train a network on another dataset.
c) Implement a new network (c.2 - Develop the network entirely by yourself).
e) Report showing the achieved results.

Results:

Neuronal Network	Dataset	Accuracy training	Accuracy test
VGG	TT100K	0.9664	0.9546
VGG	BelgiumTSC	0.9875	0.9607
VGG	KITTI	0.7950	0.7805
Squeeze-Excitation ResNet	TT100K	0.9987	0.9619
Squeeze-Excitation ResNet	BelgiumTSC	0.9978	0.9655
VGG with Crop (224,224)	TT100K	0.9513	0.9226
VGG pretrained on ImageNet	TT100K	0.6610	0.7859
VGG pretrained on TT100K	BelgiumTSC	0.9663	0.9497

Object Detection (Weeks 3-4)

Overview: We used for object detection the network YOLO and we implemented the RetinaNet based on the paper Focal Loss for Dense Object Detection.
YOLO or You Only Look Once consists on dividing the input image into an S x S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes, reflecting how confidence and accurate the model is to that prediction.
On the other hand, the RetinaNet is a one stage detector network that matches the state-of-the-art results of two-stage detectors.The authors of this network have identified that the cause of obtaining lower accuracies comes from class imbalance in the datasets. So, they propose a dynamically scaled cross-entropy loss that down-weights the correctly classified examples and helps focusing on difficult miss-classified samples.

Datasets:

TT100K
Udacity

Contributions to the code:

code/models/retinanet.py - RetinaNet implementation.
code/metrics/retina_metrics.py - Focal Loss and metrics implementations for RetinaNet.
code/scripts/dataset_analysis.py - Script for analysing the datasets.
code/config/* - Configuration files for image detection

Completeness of the goals:

a) Run the provided code.
- Analyze the dataset.
- Compute and compare the detection F-score on the train, validation and test parts separately.
b) Read two papers cited in the object detection networks section.
- YOLO.
- Another paper (RetinaNet).
c) Implement a new network (c.2 - Develop the network entirely by yourself).
d) Train the networks for another dataset (Udacity).
f) Report showing the achieved results.

Results:

Neuronal Network	Dataset	Overall Precision	Overall Recall	Overall F1	Average Recall	Average IoU	FPS
YOLO v2	TT100K	0.6308	0.3253	0.4292	0.9730	0.7588	70.11
YOLO v2	Udacity	0.1701	0.1606	0.1652	0.5660	0.5213	68.33
YOLO v2	Udacity (Data Aug,40 epochs)	0.1779	0.1911	0.1843	0.5831	0.5212	67.66
RetinaNet	TT100K	0.5930	0.0547	0.1205	0.9621	0.7405	69.22
RetinaNet	Udacity	0.1216	0.0978	0.1147	0.1580	0.3424	66.54

Semantic Segmentation (Weeks 5-6)

Overview: The goal of the image semantic segmentation is to label each pixel of the input image with the class that belongs. In order to do that task we used Wide - Resnet and FCN8 nets.

Datasets:

CamVid
KITTI

Contributions to the code:

code/models/wide_resnet.py - ResnetFCN implementation.
code/config/* - Configuration files for semantic segmentation

Completeness of the goals:

a) Run the provided code.
b) Read two papers.
- Summary of “Fully Convolutional Networks for Semantic Segmentation”.
- Another paper (ResnetFCN).
c) Implement a new network (c.2 - Develop the network entirely by yourself).
d) Train the networks for another dataset (KITTI).
f) Report showing the achieved results.

Results:

Neuronal Network	Dataset	Accuracy	Loss	Jaccard Coefficient	FPS
FCN8	CamVid	0.9226	0.2136	0.6561	20.51
FCN8	KITTI	0.8298	0.7721	0.4284	17.85
ResnetFCN	CamVid	0.9168	0.3357	0.6016	16.04
ResnetFCN	KITTI	0.7936	1.0637	0.3642	14.85

mcvavteam/mcv-m5

Scene Understanding for Autonomous Vehicles

Master in Computer Vision - M5 Visual recognition

Team "Avengers"

Project overview

Documentation

Paper Summaries

Weights of the models

Instructions for use code

Object Recognition (Week 2)

Object Detection (Weeks 3-4)

Semantic Segmentation (Weeks 5-6)