This repository stores the codebase for Airbert and some pre-trained model. It is based on the codebase of VLN-BERT.
You need to have a recent version of Python (higher than 3.6) and install dependencies:
pip install -r requirements.txt
You need first to download the BnB dataset, prepare an LMDB file containing visual features and the BnB dataset files. Everything is described in our BnB dataset repository.
Download a checkpoint of VilBERT pre-trained on Conceptual Captions.
Fine-tune the checkpoint on the BnB dataset using one of the following path-instruction method.
To make the training faster, a SLURM script is provided with 64 GPUs. You can provide extra arguments depending on the path-instruction method.
For example:
export name=pretraining-with-captionless-insertion
echo $name
sbatch --job-name $name \
--export=name=$name,pretrained=vilbert.bin,args=" --masked_vision --masked_language --min_captioned 2 --separators",prefix=2capt+ \
train-bnb-8.slurm
Make sure you have the following dataset file:
- data/bnb/bnb_train.json
- data/bnb/bnb_test.json
- data/bnb/testset.json
Then, launch training:
python train_bnb.py \
--from_pretrained vilbert.bin \
--save_name concatenation \
--separators \
--min_captioned 7 \
--masked_vision \
--masked_language
Make sure you have the following dataset file:
- data/bnb/merge+bnb_train.json
- data/bnb/merge+bnb_test.json
- data/bnb/merge+testset.json
Then, launch training:
python train_bnb.py \
--from_pretrained vilbert.bin \
--save_name image_merging \
--prefix merge+ \
--min_captioned 7 \
--separators \
--masked_vision \
--masked_language
Make sure you have the following dataset file:
- data/bnb/2capt+bnb_train.json
- data/bnb/2capt+bnb_test.json
- data/bnb/2capt+testset.json
Then, launch training:
python train_bnb.py \
--from_pretrained vilbert.bin \
--save_name captionless_insertion \
--prefix 2capt+ \
--min_captioned 2 \
--separators \
--masked_vision \
--masked_language
Make sure you have the following dataset file:
- data/bnb/np+bnb_train.json
- data/bnb/np+bnb_test.json
- data/bnb/np+testset.json
- data/np_train.json
Then, launch training:
python train_bnb.py \
--from_pretrained vilbert.bin \
--save_name instruction_rephrasing \
--prefix np+ \
--min_captioned 7 \
--separators \
--masked_vision \
--masked_language \
--skeleton data/np_train.json
First of all, you need to download the R2R data:
make r2r
python train.py \
--from_pretrained bnb-pretrained.bin \
--save_name r2rM \
--masked_language --masked_vision --no_ranking
python train.py \
--from_pretrained r2rM.bin \
--save_name r2rRS \
--shuffle_visual_features
Download the augmented paths from EnvDrop:
make speaker
Then use the train.py
script:
python train.py \
--from_pretrained r2rM.bin \
--save_name r2rRS \
--shuffle_visual_features \
--prefix aug+ \
--beam_prefix aug_
You can download a pretrained model from our model zoo.
pushd ../model-zoos # https://github.com/airbert-vln/model-zoos
make airbert-r2rRSA
popd
# Install dependencies if not already done
poetry install
# Download data if not already done
make r2r
make lmdb
poetry run python test.py \
--from_pretrained ../model-zoos/airbert-r2rRSA.bin \
--save_name testing \
--split val_unseen
Please see the repository dedicated for finetuning Airbert in generative setting.
The datasets are provided in data/task/
See the BibTex file.