This repository contains the implementations of Enhancing Image-Text Matching with Adaptive Feature Aggregation, accepted by ICASSP 2024.
-
Put the data files in
data
folder before running the following commands. For more details on the data files, please refer to SCAN, vsepp, and vse_infty. -
To train the models, use
./train_{DATASET}.sh
- To evaluate the models, use
./inference_{DATASET}.sh
- Notes:
{DATASET}
can bef30k
orcoco
.- Checkpoints are saved in the
models
folder.
Our implementations are based on SCAN, vsepp, vse_infty, and other repositories. We give credit to all these researchers and sincerely appreciate their contributions.
If you find the paper and the code useful, please cite our paper as follows:
@INPROCEEDINGS{10446913,
author={Wang, Zuhui and Yin, Yunting and Ramakrishnan, I.V.},
booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Enhancing Image-Text Matching with Adaptive Feature Aggregation},
year={2024},
volume={},
number={},
pages={8245-8249},
keywords={triplet ranking loss;feature enhancement;cross-modal retrieval;image-text matching},
doi={10.1109/ICASSP48485.2024.10446913}
}