PyTorch implementation of the paper "ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts" (CVPR 2022).
The environment installation of ADAPT follows that in Recurrent-VLN-BERT.
1.Install the Matterport3D Simulator. Notice that this code uses the old version (v0.1) of the simulator.
2. the versions of packages in the environment can be found here.
3. Install the Pytorch-Transformers of this version.
Please follow the instructions below to prepare the data in directories:
- MP3D navigability graphs:
connectivity
- Download the connectivity maps.
- MP3D image features:
img_features
- Download the Scene features (ResNet-152-Places365).
- R2R data added action prompts:
data
- Download the R2R data.
- Augmentation data added action prompts:
data
- Download the augmentation data.
- text sub-prompt feature:
data
- Download the text sub-prompt feature.
At the first stage, run the following scripts until the performance is converged in Val Unseen:
PREFIX=baseline python r2r_src/train.py --vlnbert prevalent --aug data/prevalent/prevalent_aug.json --batchSize 16 --lr 1e-5
At the second stage, run the following scripts using the Best Val Unseen model at the first stage:
PREFIX=ADAPT python r2r_src/train.py --vlnbert prevalent --aug data/prevalent/prevalent_aug.json --batchSize 16 --lr 1e-6 --ADAPT --load snap/baseline/state_dict/best_val_unseen --finetune
The implementation relies on resources from Recurrent-VLN-BERT. We thank the original authors for their open-sourcing.