/RIS-DMMI

Primary LanguagePython

RIS-DMMI

This repository provides the PyTorch implementation of DMMI in the following papers:
Beyond One-to-One: Rethinking the Referring Image Segmentation (ICCV2023)

News

  • 2023.10.03-The final version of our dataset has been released. Please remember to download the latest version.
  • 2023.10.03-We release our code.

Dataset

We collect a new comprehensive dataset Ref-ZOM (Zero/One/Many), which contains image-text pairs in one-to-zero, one-to-one and one-to-many conditions. Similar to RefCOCO, RefCOCO+ and G-Ref, all the images in Ref-ZOM are selected from COCO dataset. Here, we provide the text, image and annotation information of Ref-ZOM, which should be utilized with COCO_trainval2014 together.
Our dataset could be downloaded from:
[Baidu Cloud] [Google Drive]
Remember to download original COCO dataset from:
[COCO Dowanload]

Code

Prepare

  • Download the COCO_train2014 and COCO_val2014, and merge the two dataset as a new folder “trainval2014”. Then, in the Line-52 in /refer/refer.py, give the path of this folder to self.Image_DIR
  • Download and rename the "Ref-ZOM(final).p" as "refs(final).p". Then put refs(final).p and instances.json into /refer/data/ref-zom/*.
  • Prepare the Bert similar to LAVT
  • Prepare the Refcoco, Refcoco+ and Refcocog similar to LAVT

Train

  • Remember to change --output_dir and --pretrained_backbone as your path.
  • Utilize --model to select the backbone. 'dmmi-swin' for Swin-Base and 'dmmi_res' for resnet-50.
  • Utilize --dataset, --splitBy and --split to select the dataset as follwos:
# Refcoco
--dataset refcoco, --splitBy unc, --split val
# Refcoco+
--dataset refcoco+, --splitBy unc, --split val
# Refcocog(umd)
--dataset refcocog, --splitBy umd, --split val
# Refcocog(google)
--dataset refcocog, --splitBy google, --split val
# Ref-zom
--dataset ref-zom, --splitBy final, --split test
  • Begin training!!
sh train.sh

Test

  • Remember to change --test_parameter as your path. Meanwhile, set the --model, --dataset, --splitBy and --split properly.
  • Begin test!!
sh test.sh

Parameter

Refcocog(umd)

Backbone oIoU mIoU Google Drive Baidu Cloud
ResNet-101 59.02 62.59 Link Link
Swin-Base 63.46 66.48 Link Link

Ref-ZOM

Backbone oIoU mIoU Google Drive Baidu Cloud
Swin-Base 68.77 68.25 Link Link

Acknowledgements

We strongly appreciate the wonderful work of LAVT. Our code is partially founded on this code-base. If you think our work is helpful, we suggest you refer to LAVT and cite it as well.

Citation

If you find our work is helpful and want to cite our work, please use the following citation info.

@InProceedings{Hu_2023_ICCV,
    author    = {Hu, Yutao and Wang, Qixiong and Shao, Wenqi and Xie, Enze and Li, Zhenguo and Han, Jungong and Luo, Ping},
    title     = {Beyond One-to-One: Rethinking the Referring Image Segmentation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {4067-4077}
}