This is implementation for the paper "Adaptive Latent Graph Representation Learning for Image-Text Matching" (ALGR, TIP 2022). It is built on top of CAMERA and SCAN.
- Python 3.6
- Pytorch 1.8.1
- NumPy 1.19.1
- torchvision 0.9.1
We use CAMERA's data. The image features can be download here. The positions of detected boxes can be download here
We use the BERT code from BERT-pytorch. Please following here to convert the Google BERT model to a PyTorch save file $BERT_PATH
.
For MSCOCO:
Run script_coco.sh
For Flickr30K:
Run script_f30k.sh
python evaluate_models.py
@article{tian2022adaptive,
title={Adaptive Latent Graph Representation Learning for Image-Text Matching},
author={Tian, Mengxiao and Wu, Xinxiao and Jia, Yunde},
journal={IEEE Transactions on Image Processing},
year={2022},
publisher={IEEE}
}