Ref-NMS

Official codebase for AAAI 2021 paper "Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding".

Prerequisites

The following dependencies should be enough. See environment.yml for complete environment settings.

python 3.7.6
pytorch 1.1.0
torchvision 0.3.0
tensorboard 2.1.0
spacy 2.2.3

Data Preparation

Follow instructions in data/README.md to setup data directory.

Run following script to setup cache directory:

sh scripts/prepare_data.sh

This should generate following files under cache directory:

vocabulary file: std_vocab_<dataset>_<split_by>.txt
selected GloVe feature: std_glove_<dataset>_<split_by>.npy
referring expression database: std_refdb_<dataset>_<split_by>.json
critical objects database: std_ctxdb_<dataset>_<split_by>.json

Train

Train with binary XE loss:

PYTHONPATH=$PWD python tools/train_att_vanilla.py --dataset refcoco --split-by unc

Train with ranking loss:

PYTHONPATH=$PWD python tools/train_att_rank.py --dataset refcoco --split-by unc

We use tensorboard to monitor the training process. The log file can be found in tb folder.

Evaluate Recall

Save Ref-NMS proposals:

PYTHONPATH=$PWD python tools/save_ref_nms_proposals.py --dataset refcoco --split-by unc --tid <tid> --m <loss_type>

<loss_type> can be either att_vanilla for binary XE loss or att_rank for rank loss.

Evaluate recall on referent object:

PYTHONPATH=$PWD python tools/eval_proposal_hit_rate.py --m <loss_type> --dataset refcoco --split-by unc --tid <tid> --conf <conf>

conf parameter is the score threshold used to filter Ref-NMS proposals. It should be picked properly so that the recall of the referent is high while the number of proposals per expression is around 8-10.

Evaluate recall on critical objects:

PYTHONPATH=$PWD python tools/eval_proposal_ctx_recall.py --m <loss_type> --dataset refcoco --split-by unc --tid <tid> --conf <conf>

Evaluate REG Performance

Save MAttNet-style detection file:

PYTHONPATH=$PWD python tools/save_matt_dets.py --dataset refcoco --split-by unc --m <loss_type> --tid <tid> --conf <conf>

This script will save all the detection information needed for downstream REG evaluation to output/matt_dets_<loss_type>_<tid>_<dataset>_<split_by>_<top_N>.json.

We provide altered version of MAttNet and CM-A-E for downstream REG task evaluation.

First, follow the README in each repository to reproduce the original reported results as baseline (c.f. Table 2 in our paper). Then, run the following commands to evaluate on REC and RES task:

# Evaluate REC performance
python tools/extract_mrcn_ref_feats.py --dataset refcoco --splitBy unc --tid <tid> --top-N 0 --m <loss_type>
python tools/eval_ref.py --dataset refcoco --splitBy unc --tid <tid> --top-N 0 --m <loss_type>
# Evaluate RES performance
python tools/run_propose_to_mask.py --dataset refcoco --splitBy unc --tid <tid> --top-N 0 --m <loss_type>
python tools/eval_ref_masks.py --dataset refcoco --splitBy unc --tid <tid> --top-N 0 --m <loss_type> --save

Pretrained Models

We provide pre-trained model weights as long as the corresponding MAttNet-style detection file (note the MattNet-style detection files can be directly used to evaluate downstream REG task performance). With these files, one can easily reproduce our reported results.

[Google Drive] [Baidu Disk] (extraction code: 5a9r)

Citation

@inproceedings{chen2021ref,
  title={Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding},
  author={Chen, Long and Ma, Wenbo and Xiao, Jun and Zhang, Hanwang and Chang, Shih-Fu},
  booktitle={AAAI},
  year={2021}
}

sunyuxi/ref-nms