BFAN: A Python repository from CrossmodalGroup

Introduction

This is Bidirectional Focal Attention Network, source code of BFAN (project page). The paper is accepted by ACMMM2019 as Oral Presentation. It is built on top of the SCAN in PyTorch.

Our extended version is published in IEEE TMM, which is 'Focus Your Attention: A Focal Attention for Multimodal Learning'. This paper can be downloaded here.

Requirements and Installation

We recommended the following dependencies.

Python 2.7
PyTorch 1.1.0
NumPy (>1.12.1)
TensorBoard

Download data

Download the dataset files. We use the dataset files created by SCAN Kuang-Huei Lee. The word ids for each sentence is precomputed, and can be downloaded from here (for Flickr30K and MSCOCO)

Training

python train.py --data_path "$DATA_PATH" --data_name coco_precomp --vocab_path "$VOCAB_PATH" --logger_name runs/log --model_name "$MODEL_PATH"

Arguments used to train Flickr30K models and MSCOCO models are as same as those of SCAN:

For Flickr30K:

Method	Arguments
BFAN-equal	`--max_violation --lambda_softmax=20 --focal_type=equal --num_epoches=15 --lr_update=15 --learning_rate=.0002 --embed_size=1024 --batch_size=128`
BFAN-prob	`--max_violation --lambda_softmax=20 --focal_type=prob --num_epoches=15 --lr_update=15 --learning_rate=.0002 --embed_size=1024 --batch_size=128`

For MSCOCO:

Method	Arguments
BFAN-equal	`--max_violation --lambda_softmax=20 --focal_type=equal --num_epoches=20 --lr_update=15 --learning_rate=.0005 --embed_size=1024 --batch_size=128`
BFAN-prob	`--max_violation --lambda_softmax=20 --focal_type=prob --num_epoches=20 --lr_update=15 --learning_rate=.0005 --embed_size=1024 --batch_size=128`

Evaluation

Test on Flickr30K

python test.py

To do cross-validation on MSCOCO, pass fold5=True with a model trained using --data_name coco_precomp.

python testall.py

Reference

If you found this code useful, please cite the following paper:

@inproceedings{liu2019focus,
  title={Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching},
  author={Liu, Chunxiao and Mao, Zhendong and Liu, An-An and Zhang, Tianzhu and Wang, Bin and Zhang, Yongdong},
  booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
  pages={3--11},
  year={2019},
  organization={ACM}
}