/Multimodal-Context-Reasoning

A multimodal context reasoning approach that introduce the multi-view semantic alignment information via prefix tuning.

Primary LanguagePython

Multimodal-Context-Reasoning

Google Scholar Yunxin Li

A multimodal context reasoning approach that introduce the multi-view semantic alignment information via prefix tuning.

Our paper "A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues" has been accepted by ACL 2023 Main Conference.

Requirement

Python >= 3.8.10, torch >= 1.10.0

For some .zip files, you should unzip them in the current path, which contain some modified codes.

For the used Oscar-base version and pretrained multi-view alignmenter, you could download them from the HuggingFace Hub

For preprocessed PMR and VCR data, you could also download them from the ModCR_checkpoints in HuggingFace Hub.

You could put the checkpoints and data in the path same to the run_PMR_ModCR.py or run_vcr_ModCR.py. You could also put them in your own path.

Training

For PMR task, you can run the file:

python run_PMR_ModCR.py

For VCR task, you can run the file:

python run_vcr_ModCR.py

Acknowledge

Thanks all contributors for their supports!!!

If you're using ModCR in your research or applications, please cite our work:

@article{li2023multi,
  title={A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues},
  author={Li, Yunxin and Hu, Baotian and Chen, Xinyu and Ding, Yuxin and Ma, Lin and Zhang, Min},
  journal={arXiv preprint arXiv:2305.04530},
  year={2023}
}