PyTorch implementation of the paper "Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation" (TPAMI 2021).
The environment installation follows that in EnvDrop.
Python requirements: Need python3.6 (python 3.5 should be OK)
pip install -r python_requirements.txt
Install Matterport3D simulators:
git submodule update --init --recursive
sudo apt-get install libjsoncpp-dev libepoxy-dev libglm-dev libosmesa6 libosmesa6-dev libglew-dev
mkdir build && cd build
cmake -DEGL_RENDERING=ON ..
make -j8
Download Room-to-Room navigation data:
bash ./tasks/R2R/data/download.sh
Download image features for environments:
mkdir img_features
wget https://www.dropbox.com/s/o57kxh2mn5rkx4o/ResNet-152-imagenet.zip -P img_features/
cd img_features
unzip ResNet-152-imagenet.zip
Download R2R augmentation data from speaker-follower.
Download R2R navigation data added target words and candidate substitution words here.
Download object word vocabulary here.
Download adversarial training checkpoint here.
Download finetuning checkpoint here.
Run the following scripts with the finetuning checkpoint to replicate the navigation performance reported in the paper:
bash run/test_agent.sh 0
Load the adversarial training checkpoint to perform finetuning:
bash run/quick_start.sh 0
bash run/pretrain.sh 0
bash run/attack.sh 0
bash run/adv_train.sh 0
bash run/finetune.sh 0
The implementation relies on resources from EnvDrop and speaker-follower. We thank the original authors for their open-sourcing.