/mcv-m5

Master in Computer Vision - M5 Visual recognition

Primary LanguagePython

Scene Understanding for Autonomous Vehicles

Master in Computer Vision - M5 Visual recognition

Team "Avengers"

Project overview

In this project we focus on scene understanding for autonomous vehicles. Understanding the context of the own vehicle is key for autonomous driving. The project consists of three parts or stages, corresponding to object detection, recognition/classification and semantic segmentation.
Furthermore, we aim to learn the basic concepts, techniques, tricks and libraries to develop and evaluate deep neural networks.

Documentation

Paper Summaries

Weights of the models

The weights, experiments' info and the TensorBoard logs are available here.

Instructions for use code

python train.py -c config/configFile.py -e expName

Object Recognition (Week 2)

Overview: We have implemented the winner architecture of ILSVRC 2017 Classification Competition, the Squeeze-Excitation ResNet. The squeeze-excitation block consists on an squeeze step based on a Global Average Pooling over the output of the residual block and afterwards, an excitation step based on obtaining some weights for each output channel of the residual block and multiplying the channels by those weights. To obtain those weights, two fully-connected layers are used. The first one performs a dimensionality reduction over the number of channels C and uses ReLU activation. The reduction has been performed with a reduction ratio of r=16. The second FC layer recovers the original dimensionality C and uses sigmoid activation for obtaining a weight in range [0,1].

Datasets:

TT100K
BelgiumTSC
KITTI

Contributions to the code:

  • code/models/se_resnet50.py - Squeeze and Excitation ResNet implementation.
  • code/scripts/dataset_analysis.py - Script for analysing the datasets.
  • code/config/* - Configuration files for image classification

Completeness of the goals:

  • a) Run the provided code.
    - Analyze the dataset.
    - Calculate the accuracy on train and test sets.
    - Evaluate different techniques in the configuration file.
    - Transfer learning to another dataset (BTS).
    - Understand which parts of the code are doing what you specify in the configuration file.
  • b) Train a network on another dataset.
  • c) Implement a new network (c.2 - Develop the network entirely by yourself).
  • e) Report showing the achieved results.

Results:

Neuronal Network Dataset Accuracy training Accuracy test
VGG TT100K 0.9664 0.9546
VGG BelgiumTSC 0.9875 0.9607
VGG KITTI 0.7950 0.7805
Squeeze-Excitation ResNet TT100K 0.9987 0.9619
Squeeze-Excitation ResNet BelgiumTSC 0.9978 0.9655
VGG with Crop (224,224) TT100K 0.9513 0.9226
VGG pretrained on ImageNet TT100K 0.6610 0.7859
VGG pretrained on TT100K BelgiumTSC 0.9663 0.9497

Object Detection (Weeks 3-4)

Overview: We used for object detection the network YOLO and we implemented the RetinaNet based on the paper Focal Loss for Dense Object Detection.
YOLO or You Only Look Once consists on dividing the input image into an S x S grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes, reflecting how confidence and accurate the model is to that prediction.
On the other hand, the RetinaNet is a one stage detector network that matches the state-of-the-art results of two-stage detectors.The authors of this network have identified that the cause of obtaining lower accuracies comes from class imbalance in the datasets. So, they propose a dynamically scaled cross-entropy loss that down-weights the correctly classified examples and helps focusing on difficult miss-classified samples.

Datasets:

TT100K
Udacity

Contributions to the code:

  • code/models/retinanet.py - RetinaNet implementation.
  • code/metrics/retina_metrics.py - Focal Loss and metrics implementations for RetinaNet.
  • code/scripts/dataset_analysis.py - Script for analysing the datasets.
  • code/config/* - Configuration files for image detection

Completeness of the goals:

  • a) Run the provided code.

    • Analyze the dataset.
    • Compute and compare the detection F-score on the train, validation and test parts separately.
  • b) Read two papers cited in the object detection networks section.

    • YOLO.
    • Another paper (RetinaNet).
  • c) Implement a new network (c.2 - Develop the network entirely by yourself).

  • d) Train the networks for another dataset (Udacity).

  • f) Report showing the achieved results.

Results:

Neuronal Network Dataset Overall Precision Overall Recall Overall F1 Average Recall Average IoU FPS
YOLO v2           TT100K     0.6308 0.3253     0.4292 0.9730 0.7588 70.11
YOLO v2 Udacity 0.1701 0.1606 0.1652 0.5660 0.5213 68.33
YOLO v2 Udacity (Data Aug,40 epochs) 0.1779 0.1911 0.1843 0.5831 0.5212 67.66
RetinaNet TT100K 0.5930 0.0547 0.1205 0.9621 0.7405 69.22
RetinaNet Udacity 0.1216 0.0978 0.1147 0.1580 0.3424 66.54

Semantic Segmentation (Weeks 5-6)

Overview: The goal of the image semantic segmentation is to label each pixel of the input image with the class that belongs. In order to do that task we used Wide - Resnet and FCN8 nets.

Datasets:

CamVid
KITTI

Contributions to the code:

  • code/models/wide_resnet.py - ResnetFCN implementation.
  • code/config/* - Configuration files for semantic segmentation

Completeness of the goals:

  • a) Run the provided code.
  • b) Read two papers.
    • Summary of “Fully Convolutional Networks for Semantic Segmentation”.
    • Another paper (ResnetFCN).
  • c) Implement a new network (c.2 - Develop the network entirely by yourself).
  • d) Train the networks for another dataset (KITTI).
  • f) Report showing the achieved results.

Results:

Neuronal Network Dataset Accuracy Loss Jaccard Coefficient FPS
FCN8          CamVid     0.9226 0.2136 0.6561 20.51
FCN8 KITTI 0.8298 0.7721 0.4284 17.85
ResnetFCN CamVid 0.9168 0.3357 0.6016 16.04
ResnetFCN KITTI 0.7936 1.0637 0.3642 14.85