This is Bidirectional Focal Attention Network, source code of BFAN (project page). The paper is accepted by ACMMM2019 as Oral Presentation. It is built on top of the SCAN in PyTorch.
Our extended version is published in IEEE TMM, which is 'Focus Your Attention: A Focal Attention for Multimodal Learning'. This paper can be downloaded here.
We recommended the following dependencies.
- Python 2.7
- PyTorch 1.1.0
- NumPy (>1.12.1)
- TensorBoard
Download the dataset files. We use the dataset files created by SCAN Kuang-Huei Lee. The word ids for each sentence is precomputed, and can be downloaded from here (for Flickr30K and MSCOCO)
python train.py --data_path "$DATA_PATH" --data_name coco_precomp --vocab_path "$VOCAB_PATH" --logger_name runs/log --model_name "$MODEL_PATH"
Arguments used to train Flickr30K models and MSCOCO models are as same as those of SCAN:
For Flickr30K:
Method | Arguments |
---|---|
BFAN-equal | --max_violation --lambda_softmax=20 --focal_type=equal --num_epoches=15 --lr_update=15 --learning_rate=.0002 --embed_size=1024 --batch_size=128 |
BFAN-prob | --max_violation --lambda_softmax=20 --focal_type=prob --num_epoches=15 --lr_update=15 --learning_rate=.0002 --embed_size=1024 --batch_size=128 |
For MSCOCO:
Method | Arguments |
---|---|
BFAN-equal | --max_violation --lambda_softmax=20 --focal_type=equal --num_epoches=20 --lr_update=15 --learning_rate=.0005 --embed_size=1024 --batch_size=128 |
BFAN-prob | --max_violation --lambda_softmax=20 --focal_type=prob --num_epoches=20 --lr_update=15 --learning_rate=.0005 --embed_size=1024 --batch_size=128 |
Test on Flickr30K
python test.py
To do cross-validation on MSCOCO, pass fold5=True
with a model trained using
--data_name coco_precomp
.
python testall.py
If you found this code useful, please cite the following paper:
@inproceedings{liu2019focus,
title={Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching},
author={Liu, Chunxiao and Mao, Zhendong and Liu, An-An and Zhang, Tianzhu and Wang, Bin and Zhang, Yongdong},
booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
pages={3--11},
year={2019},
organization={ACM}
}
@ARTICLE{9305249,
author={Liu, Chunxiao and Mao, Zhendong and Zhang, Tianzhu and Liu, An-An and Wang, Bin and Zhang, Yongdong},
journal={IEEE Transactions on Multimedia},
title={Focus Your Attention: A Focal Attention for Multimodal Learning},
year={2022},
volume={24},
number={},
pages={103-115},
doi={10.1109/TMM.2020.3046855}}