Data sets and code to mine data from Visual Genome are in mine-data
directory. Also contains eval_dataset.py
for comparing the VG dataset with CoDa and Wikipedia.
We train two models: Distilled
and CaptionBERT
.
The model checkpoints can be accessed at this link
The probing
directory contains code for zero shot evaluation ("best template" mode) -- eval_zero_shot.py
-- and the logistic regression classification case -- eval_classification.py
. emb_for_size.py
contains code for the "adjective projection" method for size evaluation.
plot_snippet.py
contains the plotting scripts for the heatmap and the dot-to-dot linked plot for individual objects.
Run files in probing
from the parent directory. E.g.
python probing/eval_zero_shot.py
Code for soft prompt tuning are in the soft-prompts
directory. (source reference: https://github.com/hiaoxui/soft-prompts). Custom evaluation ("average template" case in the paper) is in soft-prompts/soft_prompts/run/model_eval.py
. Run eval with config-vl-eval.yaml
and prompt training with config-vl.yaml
. See the README there for more information.
If this code is useful in you research, please cite
@article{zhang2022visual,
title={Visual Commonsense in Pretrained Unimodal and Multimodal Models},
author={Zhang, Chenyu and Van Durme, Benjamin and Li, Zhuowan and Stengel-Eskin, Elias},
journal={arXiv preprint arXiv:2205.01850},
year={2022}
}