/ref-nms

Official codebase for "Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding"

Primary LanguagePythonMIT LicenseMIT

Ref-NMS

Official codebase for AAAI 2021 paper "Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding".

Prerequisites

The following dependencies should be enough. See environment.yml for complete environment settings.

  • python 3.7.6
  • pytorch 1.1.0
  • torchvision 0.3.0
  • tensorboard 2.1.0
  • spacy 2.2.3

Data Preparation

Follow instructions in data/README.md to setup data directory.

Run following script to setup cache directory:

sh scripts/prepare_data.sh

This should generate following files under cache directory:

  • vocabulary file: std_vocab_<dataset>_<split_by>.txt
  • selected GloVe feature: std_glove_<dataset>_<split_by>.npy
  • referring expression database: std_refdb_<dataset>_<split_by>.json
  • critical objects database: std_ctxdb_<dataset>_<split_by>.json

Train

Train with binary XE loss:

PYTHONPATH=$PWD python tools/train_att_vanilla.py --dataset refcoco --split-by unc

Train with ranking loss:

PYTHONPATH=$PWD python tools/train_att_rank.py --dataset refcoco --split-by unc

We use tensorboard to monitor the training process. The log file can be found in tb folder.

Evaluate Recall

Save Ref-NMS proposals:

PYTHONPATH=$PWD python tools/save_ref_nms_proposals.py --dataset refcoco --split-by unc --tid <tid> --m <loss_type>

<loss_type> can be either att_vanilla for binary XE loss or att_rank for rank loss.

Evaluate recall on referent object:

PYTHONPATH=$PWD python tools/eval_proposal_hit_rate.py --m <loss_type> --dataset refcoco --split-by unc --tid <tid> --conf <conf>

conf parameter is the score threshold used to filter Ref-NMS proposals. It should be picked properly so that the recall of the referent is high while the number of proposals per expression is around 8-10.

Evaluate recall on critical objects:

PYTHONPATH=$PWD python tools/eval_proposal_ctx_recall.py --m <loss_type> --dataset refcoco --split-by unc --tid <tid> --conf <conf>

Evaluate REG Performance

Save MAttNet-style detection file:

PYTHONPATH=$PWD python tools/save_matt_dets.py --dataset refcoco --split-by unc --m <loss_type> --tid <tid> --conf <conf>

This script will save all the detection information needed for downstream REG evaluation to output/matt_dets_<loss_type>_<tid>_<dataset>_<split_by>_<top_N>.json.

We provide altered version of MAttNet and CM-A-E for downstream REG task evaluation.

First, follow the README in each repository to reproduce the original reported results as baseline (c.f. Table 2 in our paper). Then, run the following commands to evaluate on REC and RES task:

# Evaluate REC performance
python tools/extract_mrcn_ref_feats.py --dataset refcoco --splitBy unc --tid <tid> --top-N 0 --m <loss_type>
python tools/eval_ref.py --dataset refcoco --splitBy unc --tid <tid> --top-N 0 --m <loss_type>
# Evaluate RES performance
python tools/run_propose_to_mask.py --dataset refcoco --splitBy unc --tid <tid> --top-N 0 --m <loss_type>
python tools/eval_ref_masks.py --dataset refcoco --splitBy unc --tid <tid> --top-N 0 --m <loss_type> --save

Pretrained Models

We provide pre-trained model weights as long as the corresponding MAttNet-style detection file (note the MattNet-style detection files can be directly used to evaluate downstream REG task performance). With these files, one can easily reproduce our reported results.

[Google Drive] [Baidu Disk] (extraction code: 5a9r)

Citation

@inproceedings{chen2021ref,
  title={Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding},
  author={Chen, Long and Ma, Wenbo and Xiao, Jun and Zhang, Hanwang and Chang, Shih-Fu},
  booktitle={AAAI},
  year={2021}
}