Transformer Encoder Reasoning and Alignment Network (TERAN)

Code for the cross-modal visual-linguistic retrieval method from "Fine-grained Visual Textual Alignment for Cross-modal Retrieval using Transformer Encoders", submitted to ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) [Pre-print PDF].

This work is an extension to our previous approach TERN accepted at ICPR 2020.

This repo is built on top of VSE++ and TERN.

Fine-grained Alignment for Precise Matching

Retrieval

Setup

Clone the repo and move into it:

git clone https://github.com/mesnico/TERAN
cd TERAN

Setup python environment using conda:

conda env create --file environment.yml
conda activate teran
export PYTHONPATH=.

Get the data

Download and extract the data folder, containing annotations, the splits by Karpathy et al. and ROUGEL - SPICE precomputed relevances for both COCO and Flickr30K datasets:

wget http://datino.isti.cnr.it/teran/data.tar
tar -xvf data.tar

Download the bottom-up features for both COCO and Flickr30K. We use the code by Anderson et al. for extracting them. The following command extracts them under data/coco/ and data/f30k/. If you prefer another location, be sure to adjust the configuration file accordingly.

# for MS-COCO
wget http://datino.isti.cnr.it/teran/features_36_coco.tar
tar -xvf features_36_coco.tar -C data/coco

# for Flickr30k
wget http://datino.isti.cnr.it/teran/features_36_f30k.tar
tar -xvf features_36_f30k.tar -C data/f30k

Evaluate

Download and extract our pre-trained TERAN models:

wget http://datino.isti.cnr.it/teran/pretrained_models.tar
tar -xvf pretrained_models.tar

Then, issue the following commands for evaluating a given model on the 1k (5fold cross-validation) or 5k test sets.

python3 test.py pretrained_models/[model].pth --size 1k
python3 test.py pretrained_models/[model].pth --size 5k

Please note that if you changed some default paths (e.g. features are in another folder than data/coco/features_36), you will need to use the --config option and provide the corresponding yaml configuration file containing the right paths.

Train

In order to train the model using a given TERAN configuration, issue the following command:

python3 train.py --config configs/[config].yaml --logger_name runs/teran

runs/teran is where the output files (tensorboard logs, checkpoints) will be stored during this training session.

Visualization

WIP

Reference

If you found this code useful, please cite the following paper:

@article{messina2020finegrained,
  title={Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders},
  author={Nicola Messina and Giuseppe Amato and Andrea Esuli and Fabrizio Falchi and Claudio Gennaro and Stéphane Marchand-Maillet},
  journal={arXiv preprint arXiv:2008.05231},
  year={2020},
}

License

Apache License 2.0

Tom-Cat-God/TERAN