ERFNet-Caffe-version

Implementation of the Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation in caffe

Test image from Cityscapes dataset

Semantic Segmentation of ERFNet

Publications

The deep neural network architecture is based on the following publication:
"ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation", E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, Transactions on Intelligent Transportation Systems (T-ITS), December 2017.

Several modifications were made:

Instead of training encoder and decoder stage seperately in the above paper, the whole architecture is trained directly with an auxiliary loss after the encoder part to control the loss of encoder during training phase;
In the decoder part, the kernel size of deconvolution is 2x2 rather than 3x3 in the paper, since the 3x3 kernel will lead to odd size of feature map.And two Non-bt-1D are added in the decoder part to complement the shrinked kernel size.

Visualize the prediction with python

Firstly, change caffe_root in ERFNet-Caffe/scripts/test_segmentation.py to the absolute path of caffe; the original caffe version BVLC/caffe is enough for prediction.
After that, you can visualize the prediction of ERFNet by running:

$ python test_segmentation.py 	--model ERFNet-Caffe/prototxts/erfnet_deploy_mergebn.prototxt \
				--weights ERFNet-Caffe/weights/erfnet_cityscapes_mergebn.caffemodel\
				--colours ERFNet-Caffe/scripts/cityscapes19.png \
				--input_image ERFNet-Caffe/example_image/munich_000000_000019_leftImg8bit.png \
				--out_dir ERFNet-Caffe/example_image/

Training ERFNet

Compile ERFNet-Caffe/caffe-erfnet for training. Caffe-erfnet combines the interp layer in PSPNet and DenseImageData layer in caffe-enet to create auxiliary loss and data interface, respectively.
Execute ERFNet-Caffe/scripts/createTrainIdLabelImgs.py to create the trainIDLabel Images for training. (The script is from Marius Cordts' work cityscapesScripts )
Change your net directory and snapshot_prefix directory in ERFNet-Caffe/prototxts/erfnet_solver.prototxt;
Change your source directory in ERFNet-Caffe/prototxts/erfnet_train_val.prototxt;
Change your directory of cityscapes data (images and labels) in ERFNet-Caffe/dataset/train_fine_cityscapes.txt and ERFNet-Caffe/dataset/eval_fine_cityscapes.txt.
Start the training from scratch:

$ ERFNet-Caffe/caffe-erfnet/build/tools/caffe train -solver /ERFNet-Caffe/prototxts/erfnet_solver.prototxt

or start the training with the pretrained model:

$ ERFNet-Caffe/caffe-erfnet/build/tools/caffe train -solver /ERFNet-Caffe/prototxts/erfnet_solver.prototxt -snapshot /ERFNet-Caffe/weights/erfnet_cityscapes.caffemodel

Accelerate prediction (optional)

Merge BatchNorm & Scale layers into Convolution layers; and remove dropout layer in test phase to accelerate prediction

$ python merge_bn_scale_droupout.py 	--model ERFNet-Caffe/prototxts/erfnet_deploy.prototxt \
				--weights ERFNet-Caffe/weights/erfnet_cityscapes.caffemodel\
				--output_model ERFNet-Caffe/prototxts/erfnet_deploy_mergebn.prototxt \
				--output_weights ERFNet-Caffe/weights/erfnet_cityscapes_mergebn.caffemodel

Create video (optional)

By running ERFNet-Caffe/scripts/rename_images.py, a sequence of images in a file are renamed into the formate of 0000.png,0001.png,0002.png etc.
Execute ERFNet-Caffe/scripts/webcam_demo.py to write the predictions into video:

$ python webcam_demo.py 	--model ERFNet-Caffe/prototxts/erfnet_deploy_mergebn.prototxt \
				--weights ERFNet-Caffe/weights/erfnet_cityscapes_mergebn.caffemodel \
				--colours ERFNet-Caffe/scripts/cityscapes19.png

Evaluation mIoU

Firstly, execute ERFNet-Caffe/scripts/test_segmentation_iter.py to save the predicted trainID labels.
Secondly, execute ERFNet-Caffe/scripts/evalPixelLevelSemanticLabeling_trainId.py to evaluate classes-IoU, mIoU and categories Iou. (The bases of the script are Marius Cordts' work cityscapesScripts )