pytorch-auto-drive is a pure Python codebase includes semantic segmentation models, lane detection models, based on PyTorch with mixed precision training. For example, you do not need matlab to test on CULane.
This repository is under active development, results with models uploaded are stable. For legacy code users, please check deprecations for changes.
A demo video from ERFNet:
demo.mp4
Various methods tested on a wide range of backbones, modulated and easily understood codes, image/keypoint loading, transformations and visualizations, mixed precision training and tensorboard logging.
Models from this repo are faster to train (single card trainable) and often have better performance than other implementations, see wiki for reasons and technical specification of models.
Task | Dataset |
---|---|
semantic segmentation | PASCAL VOC 2012 |
semantic segmentation | Cityscapes |
semantic segmentation | GTAV* |
semantic segmentation | SYNTHIA* |
lane detection | CULane |
lane detection | TuSimple |
lane detection | LLAMAS |
lane detection | BDD100K (In progress) |
* The UDA baseline setup, with Cityscapes val set as validation.
Task | Backbone | Model/Method |
---|---|---|
semantic segmentation | ResNet-101 | FCN |
semantic segmentation | ResNet-101 | DeeplabV2 |
semantic segmentation | ResNet-101 | DeeplabV3 |
semantic segmentation | - | ENet |
semantic segmentation | - | ERFNet |
lane detection | ENet, ERFNet, VGG16, ResNets (18, 34, 50, 101) | Baseline |
lane detection | ERFNet, VGG16, ResNets (18, 34, 50, 101) | SCNN |
lane detection | ResNets (18, 34, 50, 101) | RESA |
lane detection | ERFNet, ENet | SAD (Postponed) |
lane detection | ERFNet | PRNet (In progress) |
lane detection | ResNets (18, 34, 50, 101), ResNet18-reduced | LSTR |
The VGG16 backbone corresponds to DeepLab-LargeFOV in SCNN.
The ResNet backbone corresponds to DeepLabV2 (w.o. ASPP) with output channels reduced to 128 as in RESA.
We keep calling it VGG16/ResNet for consistency with common practices.
We provide solid results (average/best/detailed), training time, shell scripts and trained models available for download in MODEL_ZOO.md.
Please prepare the environment and code with INSTALL.md. Then follow the instructions in DATASET.md to set up datasets.
Get started with LANEDETECTION.md for lane detection.
Get started with SEGMENTATION.md for semantic segmentation.
Refer to VISUALIZATION.md for a visualization & inference tutorial, for image and video inputs.
Refer to BENCHMARK.md for a benchmarking tutorial, including FPS test, FLOPs & memory count for each supported model.
We welcome Pull Requests to fix bugs, update docs or implement new features etc. We also welcome Issues to report problems and needs, or ask questions (since your question might be more common and helpful to the community than you presume). Interested folks should checkout our roadmap.
This repository implements (or plan to implement) the following interesting papers in a unified PyTorch codebase:
Fully Convolutional Networks for Semantic Segmentation CVPR 2015
Rethinking Atrous Convolution for Semantic Image Segmentation ArXiv preprint 2017
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation ArXiv preprint 2016
ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation ITS 2017
Spatial As Deep: Spatial CNN for Traffic Scene Understanding AAAI 2018
RESA: Recurrent Feature-Shift Aggregator for Lane Detection AAAI 2021
Learning Lightweight Lane Detection CNNs by Self Attention Distillation ICCV 2019
Polynomial Regression Network for Variable-Number Lane Detection ECCV 2020
End-to-end Lane Shape Prediction with Transformers WACV 2021
You are also welcomed to make additions on this paper list, or open-source your related works here.
-
Cityscapes dataset is down-sampled by 2 when training at 256 x 512, to specify different sizes, modify them in configs.yaml; similar changes can be done with other experiments.
-
Training times are measured on a single RTX 2080Ti, including online validation time for segmentation, test time for lane detection.
-
All segmentation results reported are from single model without CRF and without multi-scale testing.