/ViP

The code of paper "Enhancing Sentence Representation with Visually-supervised Multimodal Pre-training" accepted by ACM MM'23

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

ViP

Implementation for "Enhancing Sentence Representation with Visually-supervised Multimodal Pre-training"

Requirements

Get Required Data

Data Preprocessing

# For Flickr30K
cd datasets
python split_flickr_data.py


# ViP Pretraining
python vip_pretraining.py --cfg cfg/pretrain-flickr-resnet.yml

For SNLI

python unsupervised_nli.py --cfg cfg/unsupervised/snli.yml
python snli_unsupervised.py --data_folder ViP/unsupervised/flickr-resnet/snli

For RTE

python unsupervised_nli.py --cfg cfg/unsupervised/rte.yml
python snli_unsupervised.py --data_folder ViP/unsupervised/flickr-resnet/rte

For QNLI

python unsupervised_nli.py --cfg cfg/unsupervised/qnli.yml
python snli_unsupervised.py --data_folder ViP/unsupervised/flickr-resnet/qnli

For MNLI

python unsupervised_nli.py --cfg cfg/unsupervised/mnli.yml
python snli_unsupervised.py --data_folder ViP/unsupervised/flickr-resnet/mnli

For MNLI-mm

python unsupervised_nli.py --cfg cfg/unsupervised/mnli-mm.yml
python snli_unsupervised.py --data_folder ViP/unsupervised/flickr-resnet/mnli-mm

For MRPC

python unsupervised_nli.py --cfg cfg/unsupervised/mrpc.yml
python snli_unsupervised.py --data_folder ViP/unsupervised/flickr-resnet/mrpc

For QQP

python unsupervised_nli.py --cfg cfg/unsupervised/qqp.yml
python snli_unsupervised.py --data_folder ViP/unsupervised/flickr-resnet/qqp

For QQP

python unsupervised_nli.py --cfg cfg/unsupervised/qqp.yml
python snli_unsupervised.py --data_folder ViP/unsupervised/flickr-resnet/qqp