DeMT

This repo is the official implementation of "DeMT" as well as the follow-ups. It currently includes code and models for the following tasks:

Updates

02/10/2023

We will release the code of DeMT at the end of February.
Merged Code.
Released a series of models. Please look into the data scaling paper for more details.

02/07/2023

News:

The Thirty-Seventh Conference on Artificial Intelligence (AAAI2023) will be held in Washington, DC, USA., from February 7-14, 2023.

02/01/2023

DeMT got accepted by AAAI 2023.

Introduction

DeMT (the name DeMT stands for Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction) is initially described in arxiv, which is based on a simple and effective encoder-decoder architecture (i.e., deformable mixer encoder and task-aware transformer decoder). First, the deformable mixer encoder contains two types of operators: the channel-aware mixing operator leveraged to allow communication among different channels (i.e., efficient channel location mixing), and the spatial-aware deformable operator with deformable convolution applied to efficiently sample more informative spatial locations (i.e., deformed features). Second, the task-aware transformer decoder consists of the task interaction block and task query block. The former is applied to capture task interaction features via self-attention. The latter leverages the deformed features and task-interacted features to generate the corresponding task-specific feature through a query-based Transformer for corresponding task predictions.

DeMT achieves strong performance on PASCAL-Context (75.33 mIoU semantic segmentation and 63.11 mIoU Human Segmentation on test) and and NYUD-v2 semantic segmentation (54.34 mIoU on test), surpassing previous models by a large margin.

Main Results on ImageNet with Pretrained Models

DeMT on NYUD-v2 dataset

model	backbone	#params	FLOPs	SemSeg	Depth	Noemal	Boundary	model checkpopint	log
DeMT	HRNet-18	4.76M	22.07G	39.18	0.5922	20.21	76.4	Google Drive	log
DeMT	Swin-T	32.07M	100.70G	46.36	0.5871	20.60	76.9	Google Drive	log
DeMT(xd=2)	Swin-T	36.6M	-	47.45	0.5563	19.90	77.0	Google Drive	log
DeMT	Swin-S	53.03M	121.05G	51.50	0.5474	20.02	78.1	Google Drive	log
DeMT	Swin-B	90.9M	153.65G	54.34	0.5209	19.21	78.5	Google Drive	log
DeMT	Swin-L	201.64M	-G	56.94	0.5007	19.14	78.8	Google Drive	log

DeMT on PASCAL-Contex dataset

model	backbone	SemSeg	PartSeg	Sal	Normal	Boundary
DeMT	HRNet-18	59.23	57.93	83.93	14.02	69.80
DeMT	Swin-T	69.71	57.18	82.63	14.56	71.20
DeMT	Swin-S	72.01	58.96	83.20	14.57	72.10
DeMT	Swin-B	75.33	63.11	83.42	14.54	73.20

Citing DeMT multi-task method

@inproceedings{xyy2023DeMT,
  title={DeMT: Deformable Mixer Transformer for Multi-Task Learning of Dense Prediction},
  author={Xu, Yangyang and Yang, Yibo and Zhang, Lefei },
  booktitle={Proceedings of the The Thirty-Seventh Conference on Artificial Intelligence (AAAI)},
  year={2023}
}

Getting Started

Install

conda install pytorch==1.7.0 torchvision==0.8.1 cudatoolkit=10.1 -c pytorch
conda install pytorch-lightning==1.1.8 -c conda-forge
conda install opencv==4.4.0 -c conda-forge
conda install scikit-image==0.17.2

Data Prepare

wget https://data.vision.ee.ethz.ch/brdavid/atrc/NYUDv2.tar.gz
wget https://data.vision.ee.ethz.ch/brdavid/atrc/PASCALContext.tar.gz
tar xfvz ./NYUDv2.tar.gz 
tar xfvz ./PASCALContext.tar.gz

Train

To train DeMT model:

python ./src/main.py --cfg ./config/t-nyud/swin/siwn_t_DeMT.yaml --datamodule.data_dir $DATA_DIR --trainer.gpus 8

Evaluation

When the training is finished, the boundary predictions are saved in the following directory: ./logger/NYUD_xxx/version_x/edge_preds/ .
The evaluation of boundary detection use the MATLAB-based SEISM repository to obtain the optimal-dataset-scale-F-measure (odsF) scores.

Acknowledgement

This repository is based ATRC. Thanks to ATRC!

yangyangxu0/DeMT