Global Inference Network- pytorch
This is the implementation of the paper: A Real-time Global Inference Network for One-stage Referring Expression Comprehension.
Our code is based on ZSGNet. We further add two modules, i.e. the Adaptive Feature Selection and the Global Attentive ReAsoNing unit with an attention loss. Besides, we release all pretrained models and datasets used in our paper .
Note that the preparations of this code are following the setting of ZSGNet. If you have any problems, please contact with us.
Training
Basic usage is python code/main_dist.py "experiment_name" --arg1=val1 --arg2=val2
and the arg1, arg2 can be found in configs/cfg.yaml
. This trains using the DataParallel mode.
For distributed learning use python -m torch.distributed.launch --nproc_per_node=$ngpus code/main_dist.py
instead. This trains using the DistributedDataParallel mode. (Also see caveat in using distributed training below)
An example to train on ReferIt dataset (note you must have prepared referit dataset) would be:
python code/main_dist.py "referit_try" --ds_to_use='refclef' --bs=16 --nw=4
Similarly for distributed learning (need to set npgus as the number of gpus)
python -m torch.distributed.launch --nproc_per_node=$npgus code/main_dist.py "referit_try" --ds_to_use='refclef' --bs=16 --nw=4
Evaluation
There are two ways to evaluate.
- For validation, it is already computed in the training loop. If you just want to evaluate on validation or testing on a model trained previously ($exp_name) you can do:
python code/main_dist.py $exp_name --ds_to_use='refclef' --resume=True --only_val=True --only_test=True
or you can use a different experiment name as well and pass --resume_path
argument like:
python code/main_dist.py $exp_name --ds_to_use='refclef' --resume=True --resume_path='./tmp/models/referit_try.pth'
After this, the logs would be available inside tmp/txt_logs/$exp_name.txt
- If you have some other model, you can output the predictions in the following structure into a pickle file say
predictions.pkl
:
[
{'id': annotation_id,
'pred_boxes': [x1,y1,x2,y2]},
.
.
.
]
Then you can evaluate using code/eval_script.py
using:
python code/eval_script.py predictions_file gt_file
For referit it would be
python code/eval_script.py ./tmp/predictions/$exp_name/val_preds_$exp_name.pkl ./data/referit/csv_dir/val.csv
Datasets
Dataset | Link |
---|---|
Flickr30k | One Drive |
Referit | One Drive |
Flickr-Split-0 | One Drive |
Flickr-Split-1 | One Drive |
VG-2B,2UB,3B,3UB | One Drive |
RefCOCO,RefCOCO+,RefCOCOg | coming soon! |
Pre-trained Models
we tried to repeat the results of ZSGNet. But unfortunately, the results are a bit different from the paper, especially in referit, where the results are slightly better in our experiences.
Model | Dataset | val | test | link |
---|---|---|---|---|
ZSGNet | flickr30K | 63.15 | 63.43 | One Drive |
GIN(10 epochs) | flickr30K | 64.06 | 64.77 | One Drive |
GIN(10 epochs+ resized_10_epochs) | flickr30K | 66.54 | 68.14 | |
ZSGNet | referit | 65.99 | 62.73 | One Drive |
GIN(10 epochs) | referit | 68.40 | 65.15 | One Drive |
GIN(10 epochs+ resized_10_epochs) | referit | coming soon | coming soon |
Citation
If you find the code useful, please cite us:
@article{zhou2019a,
title={A Real-time Global Inference Network for One-stage Referring Expression Comprehension.},
author={Zhou, Yiyi and Ji, Rongrong and Luo, Gen and Sun, Xiaoshuai and Su, Jinsong and Ding, Xinghao and Lin, Chiawen and Tian, Qi},
journal={arXiv: Computer Vision and Pattern Recognition},
year={2019}}