This repository provides the PyTorch implementation of DMMI in the following papers:
Beyond One-to-One: Rethinking the Referring Image Segmentation (ICCV2023)
- 2023.10.03-The final version of our dataset has been released. Please remember to download the latest version.
- 2023.10.03-We release our code.
We collect a new comprehensive dataset Ref-ZOM (Zero/One/Many), which contains image-text pairs in one-to-zero, one-to-one and one-to-many conditions. Similar to RefCOCO, RefCOCO+ and G-Ref, all the images in Ref-ZOM are selected from COCO dataset. Here, we provide the text, image and annotation information of Ref-ZOM, which should be utilized with COCO_trainval2014 together.
Our dataset could be downloaded from:
[Baidu Cloud] [Google Drive]
Remember to download original COCO dataset from:
[COCO Dowanload]
Prepare
- Download the COCO_train2014 and COCO_val2014, and merge the two dataset as a new folder “trainval2014”. Then, in the Line-52 in
/refer/refer.py
, give the path of this folder toself.Image_DIR
- Download and rename the "Ref-ZOM(final).p" as "refs(final).p". Then put refs(final).p and instances.json into
/refer/data/ref-zom/*
. - Prepare the Bert similar to LAVT
- Prepare the Refcoco, Refcoco+ and Refcocog similar to LAVT
Train
- Remember to change
--output_dir
and--pretrained_backbone
as your path. - Utilize
--model
to select the backbone. 'dmmi-swin' for Swin-Base and 'dmmi_res' for resnet-50. - Utilize
--dataset
,--splitBy
and--split
to select the dataset as follwos:
# Refcoco
--dataset refcoco, --splitBy unc, --split val
# Refcoco+
--dataset refcoco+, --splitBy unc, --split val
# Refcocog(umd)
--dataset refcocog, --splitBy umd, --split val
# Refcocog(google)
--dataset refcocog, --splitBy google, --split val
# Ref-zom
--dataset ref-zom, --splitBy final, --split test
- Begin training!!
sh train.sh
Test
- Remember to change
--test_parameter
as your path. Meanwhile, set the--model
,--dataset
,--splitBy
and--split
properly. - Begin test!!
sh test.sh
Refcocog(umd)
Backbone | oIoU | mIoU | Google Drive | Baidu Cloud |
---|---|---|---|---|
ResNet-101 | 59.02 | 62.59 | Link | Link |
Swin-Base | 63.46 | 66.48 | Link | Link |
Ref-ZOM
Backbone | oIoU | mIoU | Google Drive | Baidu Cloud |
---|---|---|---|---|
Swin-Base | 68.77 | 68.25 | Link | Link |
We strongly appreciate the wonderful work of LAVT. Our code is partially founded on this code-base. If you think our work is helpful, we suggest you refer to LAVT and cite it as well.
If you find our work is helpful and want to cite our work, please use the following citation info.
@InProceedings{Hu_2023_ICCV,
author = {Hu, Yutao and Wang, Qixiong and Shao, Wenqi and Xie, Enze and Li, Zhenguo and Han, Jungong and Luo, Ping},
title = {Beyond One-to-One: Rethinking the Referring Image Segmentation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {4067-4077}
}