Code of the CVPR 2021 Oral paper:
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
"Neo : Are you saying I have to choose whether Trinity lives or dies? The Oracle : No, you've already made the choice. Now you have to understand it." --- The Matrix Reloaded (2003).
- Enable DDP during training and inference.
- Support the latest version of Matterport3DSimulator.
Install the Matterport3D Simulator.
Please find the versions of packages in our environment here.
Install the Pytorch-Transformers. In particular, we use this version (same as OSCAR) in our experiments.
Please follow the instructions below to prepare the data in directories:
- MP3D navigability graphs:
connectivity
- Download the connectivity maps [23.8MB].
- R2R data:
data
- Download the R2R data [5.8MB].
- Augmented data:
data/prevalent
- Download the collected triplets in PREVALENT [1.5GB] (pre-processed for easy use).
- MP3D image features:
img_features
- Download the Scene features [4.2GB] (ResNet-152-Places365).
Please refer to vlnbert_init.py to set up the directories.
- Pre-trained OSCAR weights
- Download the
base-no-labels
following this guide.
- Download the
- Pre-trained PREVALENT weights
- Download the
pytorch_model.bin
from here.
- Download the
- Recurrent-VLN-BERT:
snap
- Download the trained network weights [2.5GB] for our OSCAR-based and PREVALENT-based models.
Please read Peter Anderson's VLN paper for the R2R Navigation task.
To replicate the performance reported in our paper, load the trained network weights and run validation:
bash run/test_agent.bash
The results will be saved to logs/VLNBERT-test-Prevalent/snap/submit_test_unseen.json
.
Alternatively, you can run validate model performance locally on the validation sets:
bash run/val_agent.bash
The results will be saved to logs/VLNBERT-val-Prevalent/eval.txt
.
You can simply switch between the OSCAR-based and the PREVALENT-based VLN models by changing the arguments vlnbert
(oscar or prevalent) and load
(trained model paths).
To train the network from scratch, simply run:
bash run/train_agent.bash
The trained Navigator will be saved under snap/
.
If you use or discuss our Recurrent VLN-BERT, please cite our paper:
@InProceedings{Hong_2021_CVPR,
author = {Hong, Yicong and Wu, Qi and Qi, Yuankai and Rodriguez-Opazo, Cristian and Gould, Stephen},
title = {A Recurrent Vision-and-Language BERT for Navigation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {1643-1653}
}