In this repository I focus on compiling noticable research done in the field of end to end semantic segmentation of 2D-images post Fully Convolutional Network (FCN) publication in 2014.
Hopefully I will also add some implementaiton and summary notes in future.
All current state-of-art deep learning methods for semantic segmentation has been evolved from FCN. FCN introduced the concept of replacing fully connected layer in the classification network with convolutional layers and hence a mean for end-to-end training of semantic segmentation and learning dense prediction for input image of any size. It also presented how skip connection and learned upsampling (deconvolution) could be used to recover spatial information which is lost in Deep Convolutional Neural Network (DCNN) due to max-pooling and sub-sampling.
There were three major issue with FCN which was further followed by researchers to improve performance.
- Loss of spatial information - Due to downsampling of input using max-pooling and sub-sampling
- Inability to capture global context - Due to inherent spatial invariance
- Lack of mechanism for multi-scale processing - Due to fixed-size receptive field
Giving a bigger picture, most noticable approach could be seen as follow:
- Limiting the loss of spatial information
- Dilated Convolution
- Use of dilated convolution in encoder in order to retain a dense feature map output from encoder without compromising on expanding receptiv field as we go deeper in the network and hence limiting the loss of spatial information.
- Papers: DeepLab, DilateNet, ENet
- Dilated Convolution
- Recovering the lost spatial information
- Recovering by complemeting the decoding process with the available infromation from encoder
- Skip connections from feature map before max-pooling
- Elementwise addition
- Papers: FCN, SegNet
- Concatenation
- Papers: UNet
- Elementwise addition
- Max-pooling indices
- Papers: DeconNet, SegNet
- Skip connections from feature map before max-pooling
- Recovering by learning the lost spatial information during decoding process
- Learning using convolutional layers
- Papers: UNet, SegNet
- Learning using deconvolutional layers
- Papers: FCN, DeconvNet
- Learning using convolutional layers
- Recovering by complemeting the decoding process with the available infromation from encoder
In progress
- Conditional Random Field (CRF)
- Papers: DeepLab, CRFasRNN
- Recurrent Neural Network
- Feature Fusion
Coming in future
- 20160221 A Survey of Semantic Segmentation
- 20170422 A Review on Deep Learning Techniques Applied to Semantic Segmentation ❤️
- 20170708 Deep Semantic Segmentation for Automated Driving: Taxonomy, Roadmap and Challenges
- 201711xx A review of semantic segmentation using deep neural networks
- 20180307 RTSeg: Real-time Semantic Segmentation Comparative Study ❤️
- 20180823 Methods and datasets on semantic segmentation: A review
- 20180920 Recent progress in semantic image segmentation
- 20190421 Survey on semantic segmentation using deep learning techniques
- 20191221 A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images ❤️
- 20141114 FCN Fully Convolutional Networks for Semantic Segmentation
- 20141222 DeepLab V1 Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
- 20150211 CRFasRNN Conditional Random Fields as Recurrent Neural Networks
- 20150517 DeConvNet Learning Deconvolution Network for Semantic Segmentation
- 20150518 U-Net U-Net: Convolutional Networks for Biomedical Image Segmentation
- 20150615 ParseNet ParseNet: Looking Wider to See Better
- 20150902 DAG-RNN DAG-Recurrent Neural Networks For Scene Labeling
- 20151102 SegNet SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
- 20151109 Bayesian SegNet Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding
- 20151110 Attention to scale: Scale-aware semantic image segmentation
- 20151122 ReSeG ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation
- 20151123 DilateNet Multi-Scale Context Aggregation by Dilated Convolutions
- 20160329 sharpMask Learning to Refine Object Segments
- 20160418 LSTM-CF LSTM-CF: Unifying Context Modeling and Fusion with LSTMs for RGB-D Scene Labeling
- 20160602 DeepLab V2 DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
- 20160607 ENet ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation
- 20161120 RefineNet RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation
- 20161130 Wider or Deeper Wider or Deeper: Revisiting the ResNet Model for Visual Recognition
- 20161204 PSPNet Pyramid Scene Parsing Network
- 20170617 DeepLab V3 Rethinking Atrous Convolution for Semantic Image Segmentation