/ADAPT

code for the paper "ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts" (CVPR 2022)

Primary LanguagePythonOtherNOASSERTION

ADAPT

PyTorch implementation of the paper "ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts" (CVPR 2022).

Prerequisites

Installation

The environment installation of ADAPT follows that in Recurrent-VLN-BERT.
1.Install the Matterport3D Simulator. Notice that this code uses the old version (v0.1) of the simulator.
2. the versions of packages in the environment can be found here.
3. Install the Pytorch-Transformers of this version.

Data Preparation

Please follow the instructions below to prepare the data in directories:

R2R Navigation

Two-phase Training

At the first stage, run the following scripts until the performance is converged in Val Unseen:

PREFIX=baseline python r2r_src/train.py --vlnbert prevalent --aug data/prevalent/prevalent_aug.json --batchSize 16 --lr 1e-5 

At the second stage, run the following scripts using the Best Val Unseen model at the first stage:

PREFIX=ADAPT python r2r_src/train.py --vlnbert prevalent --aug data/prevalent/prevalent_aug.json --batchSize 16 --lr 1e-6   --ADAPT --load snap/baseline/state_dict/best_val_unseen --finetune

Acknowledgements

The implementation relies on resources from Recurrent-VLN-BERT. We thank the original authors for their open-sourcing.