Mingxiao Li*, Zehao Wang*, Tinne Tuytelaars, Marie-Francine Moens
AAAI 2023 main conference
Please setup Matterport3DSimulator docker env following link
For missing packages, please check the corresponding version in requirements.txt
The data preparation including two step, preprocessing for image generation and token id extraction
-
Follow the insturction in vln-duet, or download data from Dropbox including processed annotations, features. Unzip the
REVERIE
andR2R
folder intodatasets
-
Since we mainly use CLIP as our visual feature encoder, please follow the instruction in link and make sure to load
ViT-L-14-336px.pt
during training. Recommand to put inckpts/ViT-L-14-336px.pt
-
Make sure to install GLIDE for generation
-
Download Matterport3D dataset from link
-
Additional data from lad is released at link
- Generate imagined image of goal position
python preprocess/ge_ins2img_feats.py --encoder clip --dataset reverie \
--input_dir datasets/REVERIE/annotations/REVERIE_{split}_enc.json \
--clip_save_dir datasets/REVERIE/features/reverie_ins2img_clip.h5 \
--collect_clip
Put the generated data in the directory datasets/REVERIE/features
- The room type codebook
room_type_feats.h5
has been provided at root directory
-
Setup the output path and Matterport3D connectivity path in
preprocess/get_all_imgs_fts.py
Run bellow to get tsv file.python preprocess/get_all_imgs_fts.py
-
Download the vit feature following VLN-DUET and put it in the directore of
datasets/REVERIE/features
Setup path in preprocess/convert_tsv2h5.py
Run to get .h5 file and put is in the directorydatasets/REVERIE/features
python preprocess/convert_tsv2h5.py
- Make sure the
datasets
folder under rootlad_src
- link matterport dataset to
mp3d
underlad_src
folder The structure of these two dataset folders should be organized as
lad_src
├── datasets
│ ├── REVERIE
│ │ ├── annotations
│ │ └── features
│ │ ├── obj.avg.top3.min80_vit_base_patch16_224_imagenet.hdf5
│ │ └── full_reverie_ins2img_clip.h5
| └── R2R
├── mp3d
│ └── v1
└── scans
Since ins2img consume too much disk space in our situition, for augmentation data in phase1, we do not include goal dreamer in the warmup training
cd warmup_src
sh scripts/final_frt_gd_phase1.sh
cd warmup_src
sh scripts/final_frt_gd_phase2.sh # need replace phase_ckpt in this script by best phase1 results
cd training_src
sh scripts/final_frt_gd_finetuning_stable.sh # need replace phase_ckpt in this script by best phase1 results
cd training_src
sh scripts/eval.sh # need replace resumedir in this script to best training result obtained above
NOTE: The checkpoints of LAD model after warmup stage 2 and final LAD model trained on REVERIE dataset can be found here
Credits to Shizhe Chen for the great baseline work VLN-DUET:
@InProceedings{Chen_2022_DUET,
author = {Chen, Shizhe and Guhur, Pierre-Louis and Tapaswi, Makarand and Schmid, Cordelia and Laptev, Ivan},
title = {Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation},
booktitle = {CVPR},
year = {2022}
}
@InProceedings{VLN_LAD_2023,
author = {Li, Mingxiao and Wang, Zehao and Tuytelaars, Tinne and Moens, Marie-Francine},
title = {Layout-aware Dreamer for Embodied Referring Expression Grounding},
booktitle = {AAAI},
year = {2023}
}