RIS-DMMI

This repository provides the PyTorch implementation of DMMI in the following papers:
Beyond One-to-One: Rethinking the Referring Image Segmentation (ICCV2023)

News

2023.10.03-The final version of our dataset has been released. Please remember to download the latest version.
2023.10.03-We release our code.

Dataset

We collect a new comprehensive dataset Ref-ZOM (Zero/One/Many), which contains image-text pairs in one-to-zero, one-to-one and one-to-many conditions. Similar to RefCOCO, RefCOCO+ and G-Ref, all the images in Ref-ZOM are selected from COCO dataset. Here, we provide the text, image and annotation information of Ref-ZOM, which should be utilized with COCO_trainval2014 together.
Our dataset could be downloaded from:
[Baidu Cloud] [Google Drive]
Remember to download original COCO dataset from:
[COCO Dowanload]

Code

Prepare

Download the COCO_train2014 and COCO_val2014, and merge the two dataset as a new folder “trainval2014”. Then, in the Line-52 in /refer/refer.py, give the path of this folder to self.Image_DIR
Download and rename the "Ref-ZOM(final).p" as "refs(final).p". Then put refs(final).p and instances.json into /refer/data/ref-zom/*.
Prepare the Bert similar to LAVT
Prepare the Refcoco, Refcoco+ and Refcocog similar to LAVT

Train

Remember to change --output_dir and --pretrained_backbone as your path.
Utilize --model to select the backbone. 'dmmi-swin' for Swin-Base and 'dmmi_res' for resnet-50.
Utilize --dataset, --splitBy and --split to select the dataset as follwos:

# Refcoco
--dataset refcoco, --splitBy unc, --split val
# Refcoco+
--dataset refcoco+, --splitBy unc, --split val
# Refcocog(umd)
--dataset refcocog, --splitBy umd, --split val
# Refcocog(google)
--dataset refcocog, --splitBy google, --split val
# Ref-zom
--dataset ref-zom, --splitBy final, --split test

Begin training!!

sh train.sh

Test

Remember to change --test_parameter as your path. Meanwhile, set the --model, --dataset, --splitBy and --split properly.
Begin test!!

sh test.sh

Parameter

Refcocog(umd)

Backbone	oIoU	mIoU	Google Drive	Baidu Cloud
ResNet-101	59.02	62.59	Link	Link
Swin-Base	63.46	66.48	Link	Link

Ref-ZOM

Backbone	oIoU	mIoU	Google Drive	Baidu Cloud
Swin-Base	68.77	68.25	Link	Link

Acknowledgements

We strongly appreciate the wonderful work of LAVT. Our code is partially founded on this code-base. If you think our work is helpful, we suggest you refer to LAVT and cite it as well.

Citation

If you find our work is helpful and want to cite our work, please use the following citation info.

@InProceedings{Hu_2023_ICCV,
    author    = {Hu, Yutao and Wang, Qixiong and Shao, Wenqi and Xie, Enze and Li, Zhenguo and Han, Jungong and Luo, Ping},
    title     = {Beyond One-to-One: Rethinking the Referring Image Segmentation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {4067-4077}
}