/Real-time-Global-Inference-Network

Code for paper "A Real-time Global Inference Network for One-stage Referring Expression Comprehension."

Primary LanguagePythonMIT LicenseMIT

Global Inference Network- pytorch

This is the  implementation of the paper: A Real-time Global Inference Network for One-stage Referring Expression Comprehension.

Our code is based on ZSGNet. We further add two modules, i.e. the Adaptive Feature Selection and the Global Attentive ReAsoNing unit with an attention loss. Besides, we release all pretrained models and datasets used in our paper .

Note that the preparations of this code are following the setting of ZSGNet. If you have any problems, please contact with us.

Training

Basic usage is python code/main_dist.py "experiment_name" --arg1=val1 --arg2=val2 and the arg1, arg2 can be found in configs/cfg.yaml. This trains using the DataParallel mode.

For distributed learning use python -m torch.distributed.launch --nproc_per_node=$ngpus code/main_dist.py instead. This trains using the DistributedDataParallel mode. (Also see caveat in using distributed training below)

An example to train on ReferIt dataset (note you must have prepared referit dataset) would be:

python code/main_dist.py "referit_try" --ds_to_use='refclef' --bs=16 --nw=4

Similarly for distributed learning (need to set npgus as the number of gpus)

python -m torch.distributed.launch --nproc_per_node=$npgus code/main_dist.py "referit_try" --ds_to_use='refclef' --bs=16 --nw=4

Evaluation

There are two ways to evaluate.

  1. For validation, it is already computed in the training loop. If you just want to evaluate on validation or testing on a model trained previously ($exp_name) you can do:
python code/main_dist.py $exp_name --ds_to_use='refclef' --resume=True --only_val=True --only_test=True

or you can use a different experiment name as well and pass --resume_path argument like:

python code/main_dist.py $exp_name --ds_to_use='refclef' --resume=True --resume_path='./tmp/models/referit_try.pth' 

After this, the logs would be available inside tmp/txt_logs/$exp_name.txt

  1. If you have some other model, you can output the predictions in the following structure into a pickle file say predictions.pkl:
[
    {'id': annotation_id,
 	'pred_boxes': [x1,y1,x2,y2]},
    .
    .
    .
]

Then you can evaluate using code/eval_script.py using:

python code/eval_script.py predictions_file gt_file

For referit it would be

python code/eval_script.py ./tmp/predictions/$exp_name/val_preds_$exp_name.pkl ./data/referit/csv_dir/val.csv

Datasets

Dataset Link
Flickr30k One Drive
Referit One Drive
Flickr-Split-0 One Drive
Flickr-Split-1 One Drive
VG-2B,2UB,3B,3UB One Drive
RefCOCO,RefCOCO+,RefCOCOg coming soon!

Pre-trained Models

we tried to repeat the results of ZSGNet. But unfortunately, the results are a bit different from the paper, especially in referit, where the results are slightly better in our experiences.

Model Dataset val test link
ZSGNet flickr30K 63.15 63.43 One Drive
GIN(10 epochs) flickr30K 64.06 64.77 One Drive
GIN(10 epochs+ resized_10_epochs) flickr30K 66.54 68.14
ZSGNet referit 65.99 62.73 One Drive
GIN(10 epochs) referit 68.40 65.15 One Drive
GIN(10 epochs+ resized_10_epochs) referit coming soon coming soon

Citation

If you find the code useful, please cite us:

@article{zhou2019a,
title={A Real-time Global Inference Network for One-stage Referring Expression Comprehension.},
author={Zhou, Yiyi and Ji, Rongrong and Luo, Gen and Sun, Xiaoshuai and Su, Jinsong and Ding, Xinghao and Lin, Chiawen and Tian, Qi},
journal={arXiv: Computer Vision and Pattern Recognition},
year={2019}}