/2022-NeurIPS-DAA

The code of the paper of "A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval" accepted by NeurIPS' 2022.

Primary LanguagePythonMIT LicenseMIT

Introduction

PyTorch implementation for NeurIPS' 2022 paper of “A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval”. It is built on top of the SGRAF in PyTorch.

image

The full experiment results

Dataset Model (+DAA) image-to-text text-to-image
R@1 R@5 R@10 PMRP ASP R@1 R@5 R@10 PMRP ASP
Flickr30K SAF 73.9 93.0 96.2 - 65.0 56.9 81.9 87.9 - 58.4
SGR 73.8 92.9 96.3 - 65.5 56.6 80.7 84.9 - 59.0
SGRAF 78.0 94.2 97.6 - 65.8 59.9 83.4 89.2 - 59.2
MSCOCO 1K SAF 78.0 95.6 98.4 47.1 67.2 62.8 89.8 95.2 48.7 61.6
SGR 78.0 95.8 98.6 46.4 68.5 62.6 88.8 93.7 48.6 62.8
SGRAF 80.2 96.4 98.8 48.1 68.3 65.0 90.7 95.8 49.6 62.7
MSCOCO 5K SAF 56.2 83.5 90.9 35.7 67.0 40.5 70.1 80.7 36.6 61.4
SGR 56.5 84.1 91.1 35.3 68.4 40.8 70.2 80.4 36.9 62.6
SGRAF 60.0 86.4 92.4 36.6 68.2 43.5 72.3 82.5 37.5 62.5

Requirements and Installation

We recommended the following dependencies.

Pretrained model

If you don't want to train from scratch, you can download the pretrained model from here (SGR for MS-COCO model), here (SAF for MS-COCO model), here (SGR for Flickr30K model) and here (SAF for Flickr30K model).

Prepare data

We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:

wget https://iudata.blob.core.windows.net/scan/data.zip
wget https://iudata.blob.core.windows.net/scan/vocab.zip

Another download link is available below:

https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqC

To speed up dataset loading, we convert these features from numpy.array to HDF5 file. Modify the data_path in np2h.py and then run np2h.py:

python np2h.py

Training

Modify the data_path, vocab_path, model_name, logger_name in the opts.py file. Then run train.py:

For MSCOCO:

(For SGR) python train.py --data_name coco_precomp --num_epochs 30 --learning_rate 0.00015 --lr_update 20 --world_size 4 --module_name SGR --daa_weight 25
(For SAF) python train.py --data_name coco_precomp --num_epochs 30 --learning_rate 0.00015 --lr_update 20 --world_size 4 --module_name SAF --daa_weight 25

For Flickr30K:

(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --learning_rate 0.0006 --lr_update 30 --world_size 1 --module_name SGR --daa_weight 10
(For SAF) python train.py --data_name f30k_precomp --num_epochs 40 --learning_rate 0.0006 --lr_update 20 --world_size 1 --module_name SAF --daa_weight 10

Evaluation

Test on MSCOCO

To do cross-validation on MSCOCO, pass fold5=True with a model trained using --data_name coco_precomp.

python evaluation.py

Test on Flickr30K

To test on Flickr30K, pass fold5=False with a model trained using --data_name f30k_precomp.

python evaluation.py

How to compute PMRP score

PMRP is a metric to evaluate the diversity of model. More details could find in PCME.

Prepare annotations

Before evalution, you should download the captions_val2014.json and instances_val2014.json from here, or you can find them from here.

Then put them in the path pmrp_com/coco_ann.

Compute PMRP score

To compute pmrp score on MSCOCO-1K, you can run:

python pmrp_evaluation.py --path1 ${SIM_MATRIX} --n_fold 5

To compute pmrp score on MSCOCO-5K, you can run:

python pmrp_evaluation.py --path1 ${SIM_MATRIX} --n_fold 0

${SIM_MATRIX} is the path of npy format similarity matrix with the shape of (5000, 25000) produced by the models. If you want to compute PMRP score of SGRAF (integration of SGR and SAF), add --path2 ${SIM_MATRIX} as the prediction of another model.

Reference

If you found this code useful, please cite the following paper:

@inproceedings{li2022differentiable,
  author    = {Hao Li and
               Jingkuan Song and
               Lianli Gao and
               Pengpeng Zeng and
               Haonan Zhang and
               Gongfu Li},
  title     = {A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval},
  booktitle = {NeurIPS},
  year      = {2022}
}