DeepLab v1 Implementation with Pytorch

0. Develop Environment

Docker Image
- tensorflow/tensorflow:tensorflow:2.4.0-gpu-jupyter

Library
- Pytorch : Stable (1.7.1) - Linux - Python - CUDA (11.0)

model.py : VGG-16 Large FOV, DenseCRF, DeepLab v1
train.py : train VGG-16 Large FOV only (grid search on model.py)
utils.py : calculate mIoU
Used similar train settings of paper when training VGG-16 Large FOV
- input : (3, 224, 224)
- batch size : 30
- learning rate : 0.001
- momentum : 0.9
- weight decay : 0.0005
- no learning rate scheduler for convenience
mIoU score may be quite different with paper cause of lack of learning rate scheduler

DCNN : modified VGG-16
- change fully connected layers to convolution layers
- skip subsampling in 2 max-pooling layers
- atrous algorithm in last 3 convolution layers (2x)
- atrous algorithm in fist fully connected layer (4x) and change kernel size to 3*3
- change channel size of fully connected layers (4096 -> 1024)
- change channel size of final fully connected layer (1000 -> 21)
Fully connected pairwise CRF : followed the paper of Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

2-stage training
- learn DCNN first
- learn CRF next
Augmentation : use extra data
Objective : sum of cross-entropy terms for each spatial position in the CNN output map
Train Details
- minibatch SGD with momentum
  - batch size : 20
  - learning rate : 0.001 (0.01 for final classifier layer)
  - momentum : 0.9
  - weight decay : 0.0005

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs [paper]