CREC

Official implementation of the paper "Revisiting Counterfactual Problems in Referring Expression Comprehension" [CVPR 2024]

Updates

(2024/3/6) Release our C-REC datasets C-RefCOCO/+/g.
(2024/10/18) Release our C-REC model.

Datasets

C-RefCOCO/+/g are three fine-grained counterfactual referring expression comprehension (C-REC) datasets built on three REC benchmark datasets RefCOCO/+/g through our proposed CSG method.

The number of normal and counterfactual samples in C-RefCOCO/+/g is 1:1. The size of C-RefCOCO/+/g is shown as follows.

	train	val	testA(test)	testB
C-RefCOCO	61870	15566	6994	8810
C-RefCOCO+	59962	15328	7846	7108
C-RefCOCOg	30298	3676	7122

The number of seven categories of attributes in normal samples are shown as follows. Note that there are some splits that do not contain certain categories of attribute words, such as A5 (relative location relation) and A6 (relative location object) in C-RefCOCO+.

	A1	A2	A3	A4	A5	A6	A7
C-RefCOCO	23862	5136	464	16142	131	131	754
C-RefCOCO+	28573	9864	1685	2646	0	0	2354
C-RefCOCOg	11312	4114	638	4024	108	108	244

Dataset Instructions

Download ms-coco train2014 images, where the images in our datasets are all from.
Our datasets are in:

$ROOT/data
|-- crec
    |-- c_refcoco.json
    |-- c_refcoco+.json
    |-- c_refcocog.json

Definitions of every term in json files:

item	type	description
atts	str	attribute words
bbox	list	bounding box ([0,0,0,0] for counterfactual samples)
iid	int	image id (from ms-coco train2014)
refs	str	the original positive expression for both normal and counterfactual samples
cf_id	int	counterfactual polarity (1: counterfactual; 0: normal)
att_pos	int	position of attribute words (start from 0)
query	str	text query
neg	str	negative query (it would be the normal text for counterfactual samples; this is for contrastive loss calculation)
att_id	int	category of attribute word, from 1 to 7 (A1-A7)

Model

Our CREC model is based on one-stage referring expression comprehension model, augmented by our newly-built datasets.

Environment Installation

Python 3.7
PyTorch 1.11.0 + CUDA 11.3
Install mmcv following the installation guide
Install Spacy and initialize the GloVe and install other requirements as follows:

pip install -r requirements.txt
wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz
pip install en_vectors_web_lg-2.1.0.tar.gz

Training and Evaluation

Config preparation. Modify the config file according to your needs.
Train the model. Download pretrained weights of visual backbone following Pretrained Weights.

[Optional] Resume from checkpoint:

To auto resume from last_checkpoint.pth, set train.auto_resume.enabled=True in config.py, which will automatically resume from last_checkpoint.pth saved in cfg.train.output_dir.
To resume from a checkpoint, set train.auto_resume.enabled=False and train.resume_path=path/to/checkpoint.pth in config.py.

To train our model on 4 GPUs, run:

bash tools/train.sh configs/crec_refcoco.py 4

Test the model. To test our model from path/to/checkpoint.pth on 1 GPUs, run:.

bash tools/eval.sh configs/crec_refcoco.py 1 path/to/checkpoint.pth

License

This project is released under the Apache 2.0 license.

Citation

If this repository is helpful for your research, or you want to refer the provided results in your paper, consider cite:

@inproceedings{yu2024revisiting,
  title={Revisiting Counterfactual Problems in Referring Expression Comprehension},
  author={Yu, Zhihan and Li, Ruifan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={13438--13448},
  year={2024}
}

Acknowledgement

Thanks a lot for the nicely organized code from the following repos:

Glacier0012/CREC