🌟 Results of the 2nd REVERIE Challenge on ICCV Workshop 2021 begins! here
🌟 Results of The 1st REVERIE Challenge on ACL Workshop 2020! More details see here.
🌟 Leaderboard here
Here are the pre-released code and data for the CVPR 2020 paper REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
As shown in the above figure, a robot agent is given a natural language instruction referring to a remote object (here in the red bounding box) in a photo-realistic 3D environment. The agent must navigate to an appropriate location and identify the object from multiple distracting candidates. The blue discs indicate nearby navigable viewpoints provided the simulator.
Note* This section prepares everything to run or train our Navigator-Pointer model. If you are familar with R2R and just want to do the REVERIE task, you can directly go to Section 6.
Note** If you have a fresh Ubuntu system, the following instruction should work well. If not, it may screw up your existing project environments and recommend to try Section 3. Install with Docker.
A C++ compiler with C++11 support is required. Matterport3D Simulator has several dependencies:
- Ubuntu 14.04, 16.04, 18.04
- OpenCV >= 2.4 including 3.x
- OpenGL
- OSMesa
- GLM
- Numpy
- pybind11 for Python bindings
- Doxygen for building documentation
E.g. installing dependencies on Ubuntu:
sudo apt-get install libopencv-dev python-opencv freeglut3 freeglut3-dev libglm-dev libjsoncpp-dev doxygen libosmesa6-dev libosmesa6 libglew-dev
If still lack some packages during runing cmake/make or our codes, you can refer to the content in the Dockerfile.
Clone the REVERIE repository:
git clone https://github.com/YuankaiQi/REVERIE.git
cd REVERIE
Note that our repository is based on the v0.1 version Matterport3DSimulator, which was originally proposed with the Room-to-Room dataset.
Download our pre-trained mini MAttnet3 from Google Drive or Baidu Yun (code: qts6), which is modified from MAttNet to support our model training. Unzip it into the MAttnet3 folder. This is used as the our Pointer model.
You need to download RGB images and house segmentation files of the Matterport3D dataset. The following data types are required:
matterport_skybox_images
house_segmentations
The metadata is also needed, and organise data like below:
Matterport
|--v1
|--metadata
|--scans
Then update the 'matterportDir' to Matterport setting in trainFast.py.
Download and extract the tsv files into the img_features
directory from Matterport3DSimulator. You will only need the ImageNet features to replicate our results.
Let us get things ready to run experiments.
# change "rog" (remote object grounding) to any name you prefer
conda create -n rog python=3.6
Activate the enviorment you just created
conda activate rog
pip install -r tasks/REVERIE/requirements.txt
# with CUDA 90
conda install pytorch=0.4.0 cuda90 -c pytorch
conda install torchvision=0.2.0 -c pytorch
If you use a newer version, you need to modify codes to load pretrained models.
Let us compile the simulator so that we can call its functions in python.
Build EGL version using CMake:
cd build
cmake -DOSMESA_RENDERING=ON ..
# Double-check if CMake find the proper path to your python
# if not, remove the make files and use the cmake with option below instead
rm -rf *
cmake -DOSMESA_RENDERING=ON -DPYTHON_EXECUTABLE:FILEPATH=/path/to/your/bin/python ..
make
cd ../
Note There are three rendering options, which are selected using cmake options during the build process:
- Off-screen GPU rendering using EGL:
cmake -DEGL_RENDERING=ON ..
(Note: this is not supported by the v0.1 version of Matterport3D Simulator, but its latest version does.) - Off-screen CPU rendering using OSMesa:
cmake -DOSMESA_RENDERING=ON ..
(Recommended) - GPU rendering using OpenGL (requires an X server):
cmake ..
The recommended (fast) approach for training agents is using off-screen GPU rendering (EGL).
cd MAttNet3/pyutils/mask-faster-rcnn/lib
You may need to change the -arch
version in Makefile
to compile the cuda code:
GPU model | Architecture |
---|---|
TitanX (Maxwell/Pascal) | sm_52 |
GTX 960M | sm_50 |
GTX 1080 (Ti) | sm_61 |
Grid K520 (AWS g2.2xlarge) | sm_30 |
Tesla K80 (AWS p2.xlarge) | sm_37 |
Compile the CUDA-based nms
and roi_pooling
using following simple commands:
make
cd ../../refer
make
It will generate _mask.c
and _mask.so
in external/
folder.
We find that the success rate is slightly lower that obtained using environments built without docker.
- Nvidia GPU with driver >= 384
- Install docker
- Install nvidia-docker2.0
- Note: CUDA / CuDNN toolkits do not need to be installed (these are provided by the docker image)
Clone the REVERIE repository:
git clone https://github.com/YuankaiQi/REVERIE.git
cd REVERIE
First download fiels as Section 2.3. Then set an environment variable to the location of the dataset, where is the full absolute path (not a relative path or symlink) to the directory 'v1':
export MATTERPORT_DATA_DIR=<PATH>
And set the 'matterportDir' parameter to 'data' in the trainFast.py file.
Note that if is a remote sshfs mount, you will need to mount it with the -o allow_root
option or the docker container won't be able to access this directory.
To make data loading faster and to reduce memory usage we preprocess the matterport_skybox_images
by downscaling and combining all cube faces into a single image using the following script:
./scripts/downsize_skybox.py
This will take a while depending on the number of processes used. By default images are downscaled by 50% and 20 processes are used.
Build the docker image:
docker build -t reverie .
Run the docker container, mounting both the git repo and the dataset:
nvidia-docker run -it --mount type=bind,source=$MATTERPORT_DATA_DIR,target=/root/mount/Matterport3DSimulator/data/v1,readonly --volume `pwd`:/root/mount/Matterport3DSimulator reverie
Now (from inside the docker container), build the simulator and run the unit tests:
cd /root/mount/Matterport3DSimulator
mkdir build && cd build
cmake -DEGL_RENDERING=ON ..
make
cd ../
Note There are three rendering options, which are selected using cmake options during the build process (by varying line 3 in the build commands immediately above):
- Off-screen GPU rendering using EGL:
cmake -DEGL_RENDERING=ON ..
(Note: this is not supported by v0.1 of Matterport3D Simulator but its latest version does.) - Off-screen CPU rendering using OSMesa:
cmake -DOSMESA_RENDERING=ON ..
(Recommended) - GPU rendering using OpenGL (requires an X server):
cmake ..
The recommended (fast) approach for training agents is using off-screen GPU rendering (EGL).
cd MAttNet3/pyutils/mask-faster-rcnn/lib
You may need to change the -arch
version in Makefile
to compile the cuda code:
GPU model | Architecture |
---|---|
TitanX (Maxwell/Pascal) | sm_52 |
GTX 960M | sm_50 |
GTX 1080 (Ti) | sm_61 |
Grid K520 (AWS g2.2xlarge) | sm_30 |
Tesla K80 (AWS p2.xlarge) | sm_37 |
Compile the CUDA-based nms
and roi_pooling
using following simple commands:
make
cd ../../refer
make
It will generate _mask.c
and _mask.so
in external/
folder.
Run the docker container while sharing the host's X server and DISPLAY environment variable with the container:
xhost +
nvidia-docker run -it -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --mount type=bind,source=$MATTERPORT_DATA_DIR,target=/root/mount/Matterport3DSimulator/data/v1,readonly --volume `pwd`:/root/mount/Matterport3DSimulator reverie
cd /root/mount/Matterport3DSimulator
If you get an error like Error: BadShmSeg (invalid shared segment parameter) 128
you may also need to include -e="QT_X11_NO_MITSHM=1"
in the docker run command above.
- For training You can download our pre-trained models from Google Drive or Baidu Yun. If you want to train by yourself, just run the following command:
python tasks/REVERIE/trainFast.py --feedback_method sample2step --experiment_name releaseCheck
- For testing To test the model, you need first obtain navigation results by
python tasks/REVERIE/run_search.py
Then run the following command to obtain the grounded object
python tasks/REVERIE/groundingAfterNav.py
Now, you should get results in the 'experiment/releaseCheck/results/' folder.
Note that the results might be slightly different due to using different dependant package versions or GPUs.
In the tasks/REVERIE/data folder, you will have REVERIE_train.json, REVERIE_val_seen.json, REVERIE_val_unseen.json, and REVERIE_test four files, which provide instructions, paths, and target object of each task (except the REVERIE_test file). In the tasks/REVERIE/data/BBox folder, you will have json files that record objects observed at each viewpoint within 3 meters.
- Example of tarin/val_seen/val_unseen.json file
[
{
"distance" : 11.65, # distance to the goal viewpoint
"ix": 208, # reserved data, not used
"scan": "qoiz87JEwZ2", # building ID
"heading": 4.59, # initial parameters for agent
"path_id": 1357, # inherited from the R2R dataset
"objId": 66, # the unique object ID in the current building
"id": "1357_66" # task id
"instructions":[ # collected instructions for REVERIE
"Go to the entryway and clean the coffee table",
"Go to the foyer and wipe down the coffee table",
"Go to the foyer on level 1 and pull out the coffee table further from the chair"
]
"path": [ # inherited from the R2R dataset
"bdb1023cb7cc4ebd8245b9291fcbc1a2",
"a6ba3f53b7964464b23341896d3c75fa",
"c407e34577aa4724b7e5d447a5d859d1",
"9f68b19f50d14f5d8371447f73c3a2e3",
"150763c717894adc8ccbbbe640fa67ef",
"59b190857cfe47f691bf0d866f1e5aeb",
"267a7e2459054db7952fc1e3e45e98fa"
]
"instructions_l":[ # inherited from the R2R dataset and provided just for convenience
"Walk into the dining room and continue past the table. Turn left when you xxx ",
...
]
},
...
]
-
Example of json file in the bbox folder
File name format: ScanID_ViewpointID.json, e.g.,VzqfbhrpDEA_57fba128d2f042f7a59793c665a3f587.json
{ # note that this is in the variable type of dict not list
"57fba128d2f042f7a59793c665a3f587":{ # this key is the id of viewpoint
"827":{ # the key if object ID
"name": "toilet",
"visible_pos":[
6,7,8,9,19,20 # view index (0~35) which contain the object. Index is consitent with that in R2R
],
"bbox2d":[
[585,382,55,98], # [x,y,w,h] and corresponds to the views listed in the "visible_pos"
...
]
},
"833": {
...
},
...
}
}
The easiest way to integrate into your project is to preload all the objects bounding_box/label/visible_pos with the loadObjProposals() function as in the eval_release.py file. Then you are able to access visible objects using ScanID_ViewpointID as key. You can use any referring expression methods to get matched objects with an instruction.
Note The number of instructions may vary across the dataset, we recommend the following way to index an instruction:
instrType = "instructions"
self.instr_ids += ['%s_%d' % (str(item['id']),i) for i in range(len(item[instrType]))]
Just add the "'predObjId': int value" pair into your navigation results. That's it!
Below is a toy sample:
[
{
"trajectory": [
[
"a68b5ae6571e4a66a4727573b88227e4",
3.141592653589793,
0.0
],
...
],
"instr_id": "4774_267_1",
"predObjId": 402
},
...
]
We would like to thank Matterport for allowing the Matterport3D dataset to be used by the academic community. We also thank Philip Roberts, Zheng Liu, Zizheng Pan, and Sam Bahrami for their great help in building the dataset. This project is supported by the Australian Centre for Robotic Vision.
The REVERIE task and dataset are descriped in the following paper:
@inproceedings{reverie,
title={REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments},
author={Yuankai Qi and Qi Wu and Peter Anderson and Xin Wang and William Yang Wang and Chunhua Shen and Anton van den Hengel},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2020}
}