Implementation of the Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation in caffe
The deep neural network architecture is based on the following publication:
"ERFNet: Efficient Residual Factorized ConvNet for Real-time Semantic Segmentation", E. Romera, J. M. Alvarez, L. M. Bergasa and R. Arroyo, Transactions on Intelligent Transportation Systems (T-ITS), December 2017.
Several modifications were made:
- Instead of training encoder and decoder stage seperately in the above paper, the whole architecture is trained directly with an auxiliary loss after the encoder part to control the loss of encoder during training phase;
- In the decoder part, the kernel size of deconvolution is 2x2 rather than 3x3 in the paper, since the 3x3 kernel will lead to odd size of feature map.And two Non-bt-1D are added in the decoder part to complement the shrinked kernel size.
Firstly, change caffe_root in ERFNet-Caffe/scripts/test_segmentation.py to the absolute path of caffe; the original caffe version BVLC/caffe is enough for prediction.
After that, you can visualize the prediction of ERFNet by running:
$ python test_segmentation.py --model ERFNet-Caffe/prototxts/erfnet_deploy_mergebn.prototxt \
--weights ERFNet-Caffe/weights/erfnet_cityscapes_mergebn.caffemodel\
--colours ERFNet-Caffe/scripts/cityscapes19.png \
--input_image ERFNet-Caffe/example_image/munich_000000_000019_leftImg8bit.png \
--out_dir ERFNet-Caffe/example_image/
-
Compile ERFNet-Caffe/caffe-erfnet for training. Caffe-erfnet combines the interp layer in PSPNet and DenseImageData layer in caffe-enet to create auxiliary loss and data interface, respectively.
-
Execute ERFNet-Caffe/scripts/createTrainIdLabelImgs.py to create the trainIDLabel Images for training. (The script is from Marius Cordts' work cityscapesScripts )
-
Change your net directory and snapshot_prefix directory in ERFNet-Caffe/prototxts/erfnet_solver.prototxt;
-
Change your source directory in ERFNet-Caffe/prototxts/erfnet_train_val.prototxt;
-
Change your directory of cityscapes data (images and labels) in ERFNet-Caffe/dataset/train_fine_cityscapes.txt and ERFNet-Caffe/dataset/eval_fine_cityscapes.txt.
-
Start the training from scratch:
$ ERFNet-Caffe/caffe-erfnet/build/tools/caffe train -solver /ERFNet-Caffe/prototxts/erfnet_solver.prototxt
or start the training with the pretrained model:
$ ERFNet-Caffe/caffe-erfnet/build/tools/caffe train -solver /ERFNet-Caffe/prototxts/erfnet_solver.prototxt -snapshot /ERFNet-Caffe/weights/erfnet_cityscapes.caffemodel
Merge BatchNorm & Scale layers into Convolution layers; and remove dropout layer in test phase to accelerate prediction
$ python merge_bn_scale_droupout.py --model ERFNet-Caffe/prototxts/erfnet_deploy.prototxt \
--weights ERFNet-Caffe/weights/erfnet_cityscapes.caffemodel\
--output_model ERFNet-Caffe/prototxts/erfnet_deploy_mergebn.prototxt \
--output_weights ERFNet-Caffe/weights/erfnet_cityscapes_mergebn.caffemodel
By running ERFNet-Caffe/scripts/rename_images.py, a sequence of images in a file are renamed into the formate of 0000.png,0001.png,0002.png etc.
Execute ERFNet-Caffe/scripts/webcam_demo.py to write the predictions into video:
$ python webcam_demo.py --model ERFNet-Caffe/prototxts/erfnet_deploy_mergebn.prototxt \
--weights ERFNet-Caffe/weights/erfnet_cityscapes_mergebn.caffemodel \
--colours ERFNet-Caffe/scripts/cityscapes19.png
- Firstly, execute ERFNet-Caffe/scripts/test_segmentation_iter.py to save the predicted trainID labels.
- Secondly, execute ERFNet-Caffe/scripts/evalPixelLevelSemanticLabeling_trainId.py to evaluate classes-IoU, mIoU and categories Iou. (The bases of the script are Marius Cordts' work cityscapesScripts
)