This code is a unofficial implementation of the experiments on Cityscapes in the ACFNet.
The code can running, but I don't have enough GPUs to train the model. If you have GPUs to train, I believe you can get a good result. Welcome to tell me about your results. This code is based on my understanding,and if you find any error in the code or different from the paper, please let me know in time. Mail:840996363@qq.com. Good luck!
The paper present the concept of class center which extracts the global context from a categorical perspective.This class-level context describes the overall representation of each class in an image. And it further propose a novel module, named Attentional Class Feature (ACF) module, to calculate and adaptively combine different class centers according to each pixel. Based on the ACF module, it introduce a coarse-to-fine segmentation network, called Attentional Class Feature Network (ACFNet), which can be composed of an ACF module and any off-the-shell segmentation network (base network). It achieve new state-of-the-art performance of 81.85% mIoU on Cityscapes dataset with only finely annotated data used for training.
Python 3.7
4 x 12g GPUs (e.g. TITAN XP)
# Install **Pytorch-1.1**
$ conda install pytorch torchvision cudatoolkit=9.0 -c pytorch
# Install **Apex**
$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
# Install **Inplace-ABN**
$ git clone https://github.com/mapillary/inplace_abn.git
$ cd inplace_abn
$ python setup.py install
Plesae download cityscapes dataset and unzip the dataset into YOUR_CS_PATH
.
I use the ECA-Net101 as Baseline Network. Please download ECA-Net101 pretrained, and put it into dataset
folder.
Training script, you can modify the --restore-from, --num-steps or so..
python train.py --data-dir ${YOUR_CS_PATH} --random-mirror --random-scale --restore-from ./dataset/***.pth --gpu 0,1,2,3 --learning-rate 1e-2 --input-size 769,769 --weight-decay 1e-4 --batch-size 8 --num-steps 60000
【Recommend】You can also open the OHEM flag to reduce the performance gap between val and test set.
python train.py --data-dir ${YOUR_CS_PATH} --random-mirror --random-scale --restore-from ./dataset/***.pth --gpu 0,1,2,3 --learning-rate 1e-2 --input-size 769,769 --weight-decay 1e-4 --batch-size 8 --num-steps 60000 --ohem 1 --ohem-thres 0.7 --ohem-keep 100000
Evaluation script.
python evaluate.py --data-dir ${YOUR_CS_PATH} --restore-from snapshots/CS_scenes_60000.pth --gpu 0
All in one.
./run_local.sh YOUR_CS_PATH
Self-attention related methods:
Criss-Cross Network
Object Context Network
Dual Attention Network
Semantic segmentation toolboxs:
pytorch-segmentation-toolbox
semantic-segmentation-pytorch
PyTorch-Encoding