Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

PyTorch implementation of the paper "Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation" (TPAMI 2021).

Environment Installation

The environment installation follows that in EnvDrop.

Python requirements: Need python3.6 (python 3.5 should be OK)

pip install -r python_requirements.txt

Install Matterport3D simulators:

git submodule update --init --recursive
sudo apt-get install libjsoncpp-dev libepoxy-dev libglm-dev libosmesa6 libosmesa6-dev libglew-dev
mkdir build && cd build
cmake -DEGL_RENDERING=ON ..
make -j8

Data Preparation

Download Room-to-Room navigation data:

bash ./tasks/R2R/data/download.sh

Download image features for environments:

mkdir img_features
wget https://www.dropbox.com/s/o57kxh2mn5rkx4o/ResNet-152-imagenet.zip -P img_features/
cd img_features
unzip ResNet-152-imagenet.zip

Download R2R augmentation data from speaker-follower.
Download R2R navigation data added target words and candidate substitution words here.
Download object word vocabulary here.

Trained Network weights

Download adversarial training checkpoint here.
Download finetuning checkpoint here.

Code

Reproduce Testing Results

Run the following scripts with the finetuning checkpoint to replicate the navigation performance reported in the paper:

bash run/test_agent.sh 0

Quickly Start

Load the adversarial training checkpoint to perform finetuning:

bash run/quick_start.sh 0

Four-stage Training

bash run/pretrain.sh 0
bash run/attack.sh 0
bash run/adv_train.sh 0
bash run/finetune.sh 0

Acknowledgement

The implementation relies on resources from EnvDrop and speaker-follower. We thank the original authors for their open-sourcing.

expectorlin/DR-Attacker