This repository contains the official code of the paper: "Training Vision-Language Models with Less Bimodal Supervision ".
conda create -n lbs python=3.9
conda activate lbs
pip install -r dev-requirements.txt requirements.txt
See here.
The general form for running a training command is:
python src/train.py main_config.jsonnet --u additional_config.jsonnet
python src/train.py configs/less-bimodal-sup/vl_finetuning.jsonnet --u configs/data/vqa.jsonnet
python src/train.py configs/less-bimodal-sup/vl_pretraining.jsonnet --u configs/data/conceptual_captions.jsonnet datamodule.batch_size=48 trainer.accumulate_grad_batches=10 trainer.gpus=8
python src/train.py ../outputs/experiment/checkpoints/last.ckpt
OR
python src/train.py ../outputs/experiment
python src/train.py main_config.jsonnet --u additional_config.jsonnet load_weights=../outputs/exp/checkpoints/some_checkpoint.ckpt
Use args like in training
python tools/jsonnet.py [args]
python tools/jsonnet.py some_file.jsonnet --simple
- You can modify a list item by using its index in a command line argument (e.g. trainer.logger.0.notes=something)
- You can modify a top-level jsonnet field before the json stage by using a command line argument prefixed with "o__"
@inproceedings{
segal2022training,
title={Training Vision-Language Models with Less Bimodal Supervision},
author={Elad Segal and Ben Bogin and Jonathan Berant},
booktitle={4th Conference on Automated Knowledge Base Construction},
year={2022}
}