Code for UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning (ACL 2023)
Python=3.8, Pytorch=1.8.0, CUDA=11.1
conda create -n unis111 python=3.8
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install -r requirements.txt
-- model: TextEncoder.py, ImageEncoder.py, model.py
-- data: dataloader.py, create_data.py
-- results: logs, results, imgs, models
-- Pretrained: bert_base_uncased, bert_large_uncased, roberta_base, roberta_large, vit_base, vit_large
-- src: config.py, functions.py, metrics.py, train_food101.py
-- main.py
-- train.sh
-- test.sh
-- requirements.txt
UPMC-Food-101 is a multimodal food classification dataset. We adopt the most commonly used split method and remove those image-text pairs with missing images or text. The final dataset split is available here.
N24News is a multimodal news classification dataset. We adopt the original split method.
The examples for training and test are included in train.sh and test.sh. For direct inference on Food101, we provide a pretrained checkpoint here.
If you use this code as part of any published research, we'd really appreciate it if you could cite the following paper:
@inproceedings{Zou2023UniSMMCMC,
title={UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning},
author={Heqing Zou and Meng Shen and Chen Chen and Yuchen Hu and Deepu Rajan and Eng Siong Chng},
year={2023}
}
MIT License