/multimodal-predicate-noun-dependencies

Code for the EMNLP 2022 paper "Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?" by Mitja Nikolaus, Emmanuelle Salin, Stephane Ayache, Abdellah Fourtassi, and Benoit Favre.

Primary LanguagePython

Multimodal Predicate-Noun Dependencies

Code for the EMNLP 2022 paper "Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?" by Mitja Nikolaus, Emmanuelle Salin, Stephane Ayache, Abdellah Fourtassi, and Benoit Favre.

Installation

  • git clone --recursive <repo>

Data

All images with image ids occurring in data/sentence-semantics/eval_set.json have to be downloaded to ~/data/multimodal_evaluation/images. Feel free to contact me for sharing a zip of the relevant images and pre-extracted image features!

Python Environments:

vl-eval-pytorch-1.8.1

For LXMERT, UNITER, ViLT

conda env create --file environment_vl-eval-pytorch-1.8.1.yml 
cd src ViLT && pip install . && cd -

vl-eval-vilbert

conda env create --file environment_vl-eval-vilbert
conda activate vl-eval-vilbert
cd src/vilbert-multi-task && python setup.py build develop && cd -

vl-eval-vinvl

conda env create --file environment_vl-eval-vinvl
conda activate vl-eval-vinvl
cd src/Oscar && python setup.py build develop && cd -
python -m pip install git+https://github.com/facebookresearch/maskrcnn-benchmark.git

clip

conda env create --file environment_vl-eval-clip.yml
conda activate vl-eval-clip
pip install git+https://github.com/openai/CLIP.git

volta

For models trained in controlled conditions (VOLTA framework)

conda env create --file environment_vl-eval-volta.yml
conda activate vl-eval-volta
cd src/volta && python setup.py develop && cd -

Models

LXMERT

  • Environment: vl-eval-pytorch-1.8.1
  • Checkpoint: unc-nlp/lxmert-base-uncased
  • Image features: Bottom-up (36 boxes)
conda activate vl-eval-pytorch-1.8.1
python eval_sentence_semantics.py --model LXMERT --img-features-path ~/data/multimodal_evaluation/image_features_2048/img_features_2048.tsv
python eval_sentence_semantics.py --model LXMERT --cropped --img-features-path ~/data/multimodal_evaluation/image_features_2048/img_cropped_features_2048.tsv

UNITER

conda activate vl-eval-pytorch-1.8.1
python eval_sentence_semantics.py --model UNITER --img-features-path ~/data/multimodal_evaluation/image_features_2048/img_features_2048.tsv
python eval_sentence_semantics.py --model UNITER --cropped --img-features-path ~/data/multimodal_evaluation/image_features_2048/img_cropped_features_2048.tsv

ViLT

conda activate vl-eval-pytorch-1.8.1
python eval_sentence_semantics.py --model VILT --images-dir ~/data/multimodal_evaluation/images
python eval_sentence_semantics.py --model VILT --cropped --images-dir ~/data/multimodal_evaluation/images_cropped

Oscar

conda activate vl-eval-vinvl
python eval_sentence_semantics.py --model Oscar --img-features-path ~/data/multimodal_evaluation/image_features_2048/img_features_2048_10_100.tsv
python eval_sentence_semantics.py --model Oscar --cropped --img-features-path ~/data/multimodal_evaluation/image_features_2048/img_cropped_features_2048_10_100.tsv

VinVL

conda activate vl-eval-vinvl
python eval_sentence_semantics.py --model VINVL --img-features-path ~/data/multimodal_evaluation/image_features_vinvl/
python eval_sentence_semantics.py --model VINVL --cropped --img-features-path ~/data/multimodal_evaluation/image_features_vinvl_cropped/

ViLBERT

conda activate vl-eval-vilbert
python eval_sentence_semantics.py --model VILBERT --img-features-path ~/data/multimodal_evaluation/image_features_vilbert/img_features.p
python eval_sentence_semantics.py --model VILBERT --cropped --img-features-path ~/data/multimodal_evaluation/image_features_vilbert/img_cropped_features.p

CLIP

  • Environment: vl-eval-clip
  • Checkpoint: ViT-B/32
  • Image features: extracted within model
conda activate vl-eval-clip
python eval_sentence_semantics.py --model CLIP --images-dir ~/data/multimodal_evaluation/images
python eval_sentence_semantics.py --model CLIP --cropped --images-dir ~/data/multimodal_evaluation/images_cropped

VOLTA

  • Environment: vl-eval-volta
  • Image features: Bottom-up (36 boxes)

Example eval for VisualBERT:

conda activate vl-eval-volta
python eval_sentence_semantics_volta.py --from_pretrained ~/data/volta/VisualBERT --config_file src/volta/config/ctrl_visualbert_base.json --img-features-path ~/data/multimodal_evaluation/image_features_2048/img_features_2048.tsv
python eval_sentence_semantics_volta.py --from_pretrained ~/data/volta/VisualBERT --config_file src/volta/config/ctrl_visualbert_base.json --cropped --img-features-path ~/data/multimodal_evaluation/image_features_2048/img_cropped_features_2048.tsv

Analyses

Analysis scripts can be run after model results have been saved to runs/sentence_semantics.

Detailed results

Generate per-concept results plots (and more):

python plot_sentence_semantics_results.py --input-file runs/sentence-semantics/LXMERT/results.csv

Correlations

(First, download Train_GCC-training.tsv and safe it to data/conceptual_captions/.)

Run correlations between common predictors and model performance:

python sentence_semantics_correlations.py --models LXMERT UNITER VILBERT VILT VINVL CLIP