Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

By Yikai Wang, Fuchun Sun, Ming Lu, Anbang Yao.

Datasets

For semantic segmentation task on NYUDv2 (official dataset), we provide a link to download the dataset here. The provided dataset is originally preprocessed in this repository, and we add depth data in it. Please modify the data paths in the codes, where we add comments 'Modify data path'.

Dependencies

python==3.6.2
pytorch==1.0.0
torchvision==0.2.2
imageio==2.4.1
numpy==1.16.2
scikit-learn==0.20.2
scipy==1.1.0
opencv-python==4.0.0

Scripts

First,

cd semantic_segmentation

Training script for segmentation with RGB and Depth input, the default setting uses RefineNet (ResNet101),

python main.py --gpu 0 -c exp_name  # or --gpu 0 1 2

Evaluation script,

python main.py --gpu 0 --resume path_to_pth --evaluate  # optionally use --save-img to visualize results

License

AsymFusion is released under MIT License.

Citation

If you find our work useful for your research, please consider citing the following paper.

@inproceedings{wang2020asymfusion,
  title={Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion},
  author={Wang, Yikai and Sun, Fuchun and Lu, Ming and Yao, Anbang},
  booktitle={ACM International Conference on Multimedia (ACM MM)},
  year={2020}
}