This repository contains the source code of our paper, ESPNet.
This repository is organized as:
- train This directory contains the source code for trainig the ESPNet-C and ESPNet models.
- test This directory contains the source code for evaluating our model on RGB Images.
- pretrained This directory contains the pre-trained models on the CityScape dataset
Our model ESPNet achives an class-wise mIOU of 60.336 and category-wise mIOU of 82.178 on the CityScapes test dataset and runs at
- 112 fps on the NVIDIA TitanX (30 fps faster than ENet)
- 9 FPS on TX2
- With the same number of parameters as ENet, our model is 2% more accurate
Our model achieves an mIOU of 55.64 on the CamVid test set. We used the dataset splits (train/val/test) provided here. We trained the models at a resolution of 480x360. For comparison with other models, see SegNet paper.
Note: We did not use the 3.5K dataset for training which was used in the SegNet paper.
Model | mIOU | Class avg. |
---|---|---|
ENet | 51.3 | 68.3 |
SegNet | 55.6 | 65.2 |
ESPNet | 55.64 | 68.30 |
To run this code, you need to have following libraries:
We recommend to use Anaconda. We have tested our code on Ubuntu 16.04.
If ESPNet is useful for your research, then please cite our paper.
@article{mehta2018espnet,
title={ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation},
author={Sachin Mehta, Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi},
journal={arXiv preprint arXiv:1803.06815},
year={2018}
}