LXMERT-S:Multi-stage Pre-training over Simplified Multimodal Pre-training Models

Introduction

Code for ACL/IJCNLP 2021 paper "LXMERT-S:Multi-stage Pre-training over Simplified Multimodal Pre-training Models".

Fine-tuning

VQA

Pre-training model parameter Download. You can download the Pre-training model from "here", Put mode_LXRT.bin under the data/ folder.
Data download. You can download the dataset required for VQA fine-tuning from "LXMERT".
Fine-tuning on VQA V2.0. You can also download our fine-tuned model parameters from "here".
```
 bash run/vqa_finetune.bash 0 vqa_lxr050
```

Submitted to VQA test server. You can also download our test results from "here"

bash run/vqa_test.bash 0 vqa_lxr050_results --test test --load snap/vqa/vqa_lxr050/BEST

Results:

results Local Validation Test-Dev Test-Standard

VQA 68.49% 71.1% 71.18%

results	Local Validation	Test-Dev	Test-Standard
VQA	68.49%	71.1%	71.18%

GQA

Pre-training model parameter Download. You can download the Pre-training model from here, Put mode_LXRT.bin under the data/ folder.
Data download. You can download the dataset required for VQA fine-tuning from LXMERT.
Fine-tuning on GQA. You can also download our fine-tuned model parameters from here.
```
 bash run/gqa_finetune.bash 0 gqa_lxr050
```

Submitted to GQA test server. You can also download our test results from here

bash run/gqa_test.bash 0 vqa_lxr050_results --test test --load snap/gqa/gqa_lxr050/BEST

Results:

results Test-Dev Test-Standard

GQA 58.7% 59.12%

results	Test-Dev	Test-Standard
GQA	58.7%	59.12%

NLVR2

IR

Pre-training model parameter Download. You can download the Pre-training model from here, Put mode_LXRT.bin under the data/ folder.

Zero shot.

bash run/image_retrieval_zero_shot.bash 0 ir_zs

Fine-tuning on IR. You can execute the following instructions or download our fine-tuned model parameters from here.
```
 bash run/image_retrieval_finetune.bash 0 ir_lxr050
```
Results:

results R@1 R@5 R@10

ir(zero_shot) 42.42% 68.7% 77.92%

ir(finetune) 57.9% 83% 88.7%

results	R@1	R@5	R@10
ir(zero_shot)	42.42%	68.7%	77.92%
ir(finetune)	57.9%	83%	88.7%

TR

Pre-training model parameter Download. You can download the Pre-training model from here, Put mode_LXRT.bin under the data/ folder.

Zero shot.

bash run/sent_retrieval_zero_shot.bash 0 tr_zs

Fine-tuning on IR. You can execute the following instructions or download our fine-tuned model parameters from here.
```
 bash run/sent_retrieval_finetune.bash 0 ir_lxr050
```
Results:

results R@1 R@5 R@10

tr(zero_shot) 49% 75% 81.8%

tr(finetune) 64.6% 87.5% 90.4%

results	R@1	R@5	R@10
tr(zero_shot)	49%	75%	81.8%
tr(finetune)	64.6%	87.5%	90.4%

Reference

If you find this project helps, please cite our paper :)

@inproceedings{liu2021acl,
  author    = {Tongtong Liu and
               Fangxiang Feng and
               Xiaojie Wang},
  title     = {Multi-stage Pre-training over Simplified Multimodal Pre-training Models},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational
               Linguistics and the 11th International Joint Conference on Natural
               Language Processing, {ACL/IJCNLP} 2021, (Volume 1: Long Papers), Virtual
               Event, August 1-6, 2021},
  year      = {2021},
}

Acknowledgement

We thank airsplay for providing the LXMERT code and pre-trained models. We thank jiasenlu for providing the ViLBERT code.

lttsmn/LXMERT-S