/pytorch-auto-drive

Segmentation models (ERFNet, ENet, DeepLab, FCN...) and Lane detection models (SCNN, SAD, PRNet, RESA, LSTR...) based on PyTorch with mixed precision training

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Codebase for deep autonomous driving perception tasks

pytorch-auto-drive is a pure Python codebase includes semantic segmentation models, lane detection models, based on PyTorch with mixed precision training. For example, you do not need matlab to test on CULane.

This repository is under active development, results with models uploaded are stable. For legacy code users, please check deprecations for changes.

A demo video from ERFNet:

demo.mp4

Highlights

Various methods tested on a wide range of backbones, modulated and easily understood codes, image/keypoint loading, transformations and visualizations, mixed precision training and tensorboard logging.

Models from this repo are faster to train (single card trainable) and often have better performance than other implementations, see wiki for reasons and technical specification of models.

Supported datasets:

Task Dataset
semantic segmentation PASCAL VOC 2012
semantic segmentation Cityscapes
semantic segmentation GTAV*
semantic segmentation SYNTHIA*
lane detection CULane
lane detection TuSimple
lane detection LLAMAS
lane detection BDD100K (In progress)

* The UDA baseline setup, with Cityscapes val set as validation.

Supported models:

Task Backbone Model/Method
semantic segmentation ResNet-101 FCN
semantic segmentation ResNet-101 DeeplabV2
semantic segmentation ResNet-101 DeeplabV3
semantic segmentation - ENet
semantic segmentation - ERFNet
lane detection ENet, ERFNet, VGG16, ResNets (18, 34, 50, 101) Baseline
lane detection ERFNet, VGG16, ResNets (18, 34, 50, 101) SCNN
lane detection ResNets (18, 34, 50, 101) RESA
lane detection ERFNet, ENet SAD (Postponed)
lane detection ERFNet PRNet (In progress)
lane detection ResNets (18, 34, 50, 101), ResNet18-reduced LSTR

The VGG16 backbone corresponds to DeepLab-LargeFOV in SCNN.

The ResNet backbone corresponds to DeepLabV2 (w.o. ASPP) with output channels reduced to 128 as in RESA.

We keep calling it VGG16/ResNet for consistency with common practices.

Model Zoo

We provide solid results (average/best/detailed), training time, shell scripts and trained models available for download in MODEL_ZOO.md.

Installation

Please prepare the environment and code with INSTALL.md. Then follow the instructions in DATASET.md to set up datasets.

Getting Started

Get started with LANEDETECTION.md for lane detection.

Get started with SEGMENTATION.md for semantic segmentation.

Visualization Tools

Refer to VISUALIZATION.md for a visualization & inference tutorial, for image and video inputs.

Benchmark Tools

Refer to BENCHMARK.md for a benchmarking tutorial, including FPS test, FLOPs & memory count for each supported model.

Contributing

We welcome Pull Requests to fix bugs, update docs or implement new features etc. We also welcome Issues to report problems and needs, or ask questions (since your question might be more common and helpful to the community than you presume). Interested folks should checkout our roadmap.

This repository implements (or plan to implement) the following interesting papers in a unified PyTorch codebase:

Fully Convolutional Networks for Semantic Segmentation CVPR 2015

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs TPAMI 2017

Rethinking Atrous Convolution for Semantic Image Segmentation ArXiv preprint 2017

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation ArXiv preprint 2016

ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation ITS 2017

Spatial As Deep: Spatial CNN for Traffic Scene Understanding AAAI 2018

RESA: Recurrent Feature-Shift Aggregator for Lane Detection AAAI 2021

Learning Lightweight Lane Detection CNNs by Self Attention Distillation ICCV 2019

Polynomial Regression Network for Variable-Number Lane Detection ECCV 2020

End-to-end Lane Shape Prediction with Transformers WACV 2021

You are also welcomed to make additions on this paper list, or open-source your related works here.

Notes:

  1. Cityscapes dataset is down-sampled by 2 when training at 256 x 512, to specify different sizes, modify them in configs.yaml; similar changes can be done with other experiments.

  2. Training times are measured on a single RTX 2080Ti, including online validation time for segmentation, test time for lane detection.

  3. All segmentation results reported are from single model without CRF and without multi-scale testing.