/nice-slam

[CVPR'22] NICE-SLAM: Neural Implicit Scalable Encoding for SLAM

Primary LanguagePythonApache License 2.0Apache-2.0

NICE-SLAM: Neural Implicit Scalable Encoding for SLAM

Zihan Zhu* · Songyou Peng* · Viktor Larsson · Weiwei Xu · Hujun Bao
Zhaopeng Cui · Martin R. Oswald · Marc Pollefeys

(* Equal Contribution)

CVPR 2022

Logo

NICE-SLAM produces accurate dense geometry and camera tracking on large-scale indoor scenes.

(The black / red lines are the ground truth / predicted camera trajectory)



Table of Contents
  1. Installation
  2. Visualization
  3. Demo
  4. Run
  5. iMAP*
  6. Evaluation
  7. Acknowledgement
  8. Citation
  9. Contact

Installation

First you have to make sure that you have all dependencies in place. The simplest way to do so, is to use anaconda.

You can create an anaconda environment called nice-slam. For linux, you need to install libopenexr-dev before creating the environment.

sudo apt-get install libopenexr-dev
    
conda env create -f environment.yaml
conda activate nice-slam

Visualizing NICE-SLAM Results

We provide the results of NICE-SLAM ready for download. You can run our interactive visualizer as following.

Self-captured Apartment

To visualize our results on the self-captured apartment, as shown in the teaser:

bash scripts/download_vis_apartment.sh
python visualizer.py configs/Apartment/apartment.yaml --output output/vis/Apartment

Note for users from China: If you encounter slow speed in downloading, check in all the scripts/download_*.sh scripts, where we also provide the 和彩云 links for you to download manually.

ScanNet

bash scripts/download_vis_scene0000.sh
python visualizer.py configs/ScanNet/scene0000.yaml --output output/vis/scannet/scans/scene0000_00

You can find the results of NICE-SLAM on other scenes in ScanNet here.

Replica

bash scripts/download_vis_room1.sh
python visualizer.py configs/Replica/room1.yaml --output output/vis/Replica/room1

You can find the results of NICE-SLAM on other scenes in Replica here.

Interactive Visualizer Usage

The black trajectory indicates the ground truth trajectory, abd the red is trajectory of NICE-SLAM.

  • Press Ctrl+0 for grey mesh rendering.
  • Press Ctrl+1 for textured mesh rendering.
  • Press Ctrl+9 for normal rendering.
  • Press L to turn off/on lighting.

Command line arguments

  • --output $OUTPUT_FOLDER output folder (overwrite the output folder in the config file)
  • --input_folder $INPUT_FOLDER input folder (overwrite the input folder in the config file)
  • --save_rendering save rendering video to vis.mp4 in the output folder
  • --no_gt_traj do not show ground truth trajectory
  • --imap visualize results of iMAP*
  • --vis_input_frame opens up a viewer to show input frames. Note: you need to download the dataset first. See the Run section below.

Demo

Here you can run NICE-SLAM yourself on a short ScanNet sequence with 500 frames.

First, download the demo data as below and the data is saved into the ./Datasets/Demo folder.

bash scripts/download_demo.sh

Next, run NICE-SLAM. It takes a few minutes with ~5G GPU memory.

python -W ignore run.py configs/Demo/demo.yaml

Finally, run the following command to visualize.

python visualizer.py configs/Demo/demo.yaml 

NOTE: This is for demonstration only, its configuration/performance may be different from our paper.

Run

Self-captured Apartment

Download the data as below and the data is saved into the ./Datasets/Apartment folder.

bash scripts/download_apartment.sh

Next, run NICE-SLAM:

python -W ignore run.py configs/Apartment/apartment.yaml

ScanNet

Please follow the data downloading procedure on ScanNet website, and extract color/depth frames from the .sens file using this code.

[Directory structure of ScanNet (click to expand)]

DATAROOT is ./Datasets by default. If a sequence (sceneXXXX_XX) is stored in other places, please change the input_folder path in the config file or in the command line.

  DATAROOT
  └── scannet
      └── scans
          └── scene0000_00
              └── frames
                  ├── color
                  │   ├── 0.jpg
                  │   ├── 1.jpg
                  │   ├── ...
                  │   └── ...
                  ├── depth
                  │   ├── 0.png
                  │   ├── 1.png
                  │   ├── ...
                  │   └── ...
                  ├── intrinsic
                  └── pose
                      ├── 0.txt
                      ├── 1.txt
                      ├── ...
                      └── ...

Once the data is downloaded and set up properly, you can run NICE-SLAM:

python -W ignore run.py configs/ScanNet/scene0000.yaml

Replica

Download the data as below and the data is saved into the ./Datasets/Replica folder. Note that the Replica data is generated by the authors of iMAP, so please cite iMAP if you use the data.

bash scripts/download_replica.sh

and you can run NICE-SLAM:

python -W ignore run.py configs/Replica/room0.yaml

The mesh for evaluation is saved as $OUTPUT_FOLDER/mesh/final_mesh_eval_rec.ply, where the unseen regions are culled using all frames.

TUM RGB-D

Download the data as below and the data is saved into the ./Datasets/TUM-RGBD folder

bash scripts/download_tum.sh

Now run NICE-SLAM:

python -W ignore run.py configs/TUM_RGBD/freiburg1_desk.yaml

Co-Fusion

First, download the dataset. This script should download and unpack the data automatically into the ./Datasets/CoFusion folder.

bash scripts/download_cofusion.sh

Run NICE-SLAM:

python -W ignore run.py configs/CoFusion/room4.yaml

Use your own RGB-D sequence from Kinect Azure

[Details (click to expand)]
  1. Please first follow this guide to record a sequence and extract aligned color and depth images. (Remember to use --align_depth_to_color for azure_kinect_recorder.py)

    DATAROOT is ./Datasets in default, if a sequence (sceneXX) is stored in other places, please change the "input_folder" path in the config file or in the command line.

      DATAROOT
      └── Own
          └── scene0
              ├── color
              │   ├── 00000.jpg
              │   ├── 00001.jpg
              │   ├── 00002.jpg
              │   ├── ...
              │   └── ...
              ├── config.json
              ├── depth
              │   ├── 00000.png
              │   ├── 00001.png
              │   ├── 00002.png
              │   ├── ...
              │   └── ...
              └── intrinsic.json
    
    
  2. Prepare .yaml file based on the configs/Own/sample.yaml. Change the camera intrinsics in the config file based on intrinsic.json. You can also get the intrinsics of the depth camera via other tools such as MATLAB.

  3. Specify the bound of the scene. If no ground truth camera pose is given, we construct world coordinates on the first frame. The X-axis is from left to right, Y-axis is from down to up, Z-axis is from front to back.

  4. Change the input_folder path and/or the output path in the config file or the command line.

  5. Run NICE-SLAM.

python -W ignore run.py configs/Own/sample.yaml

(Optional but highly Recommended) If you don't want to specify the bound of the scene or manually change the config file. You can first run the Redwood tool in Open3D and then run NICE-SLAM. Here we provide steps for the whole pipeline, beginning from recording Azure Kinect videos. (Ubuntu 18.04 and above is recommended.)

  1. Download the Open3D repository.
bash scripts/download_open3d.sh
  1. Record and extract frames.
# specify scene ID
sceneid=0
cd 3rdparty/Open3D-0.13.0/examples/python/reconstruction_system/
# record and save to .mkv file
python sensors/azure_kinect_recorder.py --align_depth_to_color --output scene$sceneid.mkv
# extract frames
python sensors/azure_kinect_mkv_reader.py --input  scene$sceneid.mkv --output dataset/scene$sceneid
  1. Run reconstruction.
python run_system.py dataset/scene$sceneid/config.json --make --register --refine --integrate 
# back to main folder
cd ../../../../../
  1. Prepare the config file.
python src/tools/prep_own_data.py --scene_folder 3rdparty/Open3D-0.13.0/examples/python/reconstruction_system/dataset/scene$sceneid --ouput_config configs/Own/scene$sceneid.yaml
  1. Run NICE-SLAM.
python -W ignore run.py configs/Own/scene$sceneid.yaml

iMAP*

We also provide our re-implementation of iMAP (iMAP*) for use. If you use the code, please cite both the original iMAP paper and NICE-SLAM.

Usage

iMAP* shares a majority part of the code with NICE-SLAM. To run iMAP*, simply use *_imap.yaml in the config file and also add the argument --imap in the command line. For example, to run iMAP* on Replica room0:

python -W ignore run.py configs/Replica/room0_imap.yaml --imap 

To use our interactive visualizer:

python visualizer.py configs/Replica/room0_imap.yaml --imap 

To evaluate ATE:

python src/tools/eval_ate.py configs/Replica/room0_imap.yaml --imap 
[Differences between iMAP* and the original iMAP (click to expand)]

Keyframe pose optimization during mapping

We do not optimize the selected keyframes' poses for iMAP*, because optimizing them usually leads to worse performance. One possible reason is that since their keyframes are selected globally, and many of them do not have overlapping regions especially when the scene gets larger. Overlap is a prerequisite for bundle adjustment (BA). For NICE-SLAM, we only select overlapping keyframes within a small window (local BA), which works well in all scenes. You can still turn on the keyframe pose optimization during mapping for iMAP* by enabling BA in the config file.

Active sampling

We disable the active sampling in iMAP*, because in our experiments we observe that it does not help to improve the performance while brings additional computational overhead.

For the image active sampling, in each iteration the original iMAP uniformly samples 200 pixels in the entire image. Next, they divide this image into an 8x8 grid and calculate the probability distribution from the rendering losses. This means that if the resolution of an image is 1200x680 (Replica), only around 3 pixels are sampled to calculate the distribution for a 150x85 grid patch. This is not too much different from simple uniform sampling. Therefore, during mapping we use the same pixel sampling strategy as NICE-SLAM for iMAP*: uniform sampling, but even 4x more pixels than reported in the iMAP paper.

For the keyframe active sampling, the original iMAP requires rendering depth and color images for all keyframes to get the loss distribution, which is expensive and we again did not find it very helpful. Instead, as done in NICE-SLAM, iMAP* randomly samples keyframes from the keyframe list. We also let iMAP* optimize for 4x more iterations than NICE-SLAM, but their performance is still inferior.

Keyframe selection

For fair comparison, we use the same keyframe selection method in iMAP* as in NICE-SLAM: add one keyframe to the keyframe list every 50 frames.

Evaluation

Average Trajectory Error

To evaluate the average trajectory error. Run the command below with the corresponding config file:

python src/tools/eval_ate.py configs/Replica/room0.yaml

Reconstruction Error

To evaluate the reconstruction error, first download the ground truth Replica meshes where unseen region have been culled.

bash scripts/download_cull_replica_mesh.sh

Then run the command below (same for NICE-SLAM and iMAP*). The 2D metric requires rendering of 1000 depth images, which will take some time (~9 minutes). Use -2d to enable 2D metric. Use -3d to enable 3D metric.

# assign any output_folder and gt mesh you like, here is just an example
OUTPUT_FOLDER=output/Replica/room0
GT_MESH=cull_replica_mesh/room0.ply
python src/tools/eval_recon.py --rec_mesh $OUTPUT_FOLDER/mesh/final_mesh_eval_rec.ply --gt_mesh $GT_MESH -2d -3d

We also provide code to cull the mesh given camera poses. Here we take culling of ground truth mesh of Replica room0 as an example.

python src/tools/cull_mesh.py --input_mesh Datasets/Replica/room0_mesh.ply --traj Datasets/Replica/room0/traj.txt --output_mesh cull_replica_mesh/room0.ply
[For iMAP* evaluation (click to expand)]

As discussed in many recent papers, e.g. UNISURF/VolSDF/NeuS, manual thresholding the volume density during marching cubes might be needed. Moreover, we find out there exist scaling differences, possibly because of the reason discussed in NeuS. Therefore, ICP with scale is needed. You can use the ICP tool in CloudCompare with default configuration with scaling enabled.

Acknowledgement

We adapted some codes from some awesome repositories including convolutional_occupancy_networks, nerf-pytorch, lietorch, and DIST-Renderer. Thanks for making codes public available. We also thank Edgar Sucar for allowing us to make the Replica Dataset available.

Citation

If you find our code or paper useful, please cite

@inproceedings{Zhu2022CVPR,
  author    = {Zhu, Zihan and Peng, Songyou and Larsson, Viktor and Xu, Weiwei and Bao, Hujun and Cui, Zhaopeng and Oswald, Martin R. and Pollefeys, Marc},
  title     = {NICE-SLAM: Neural Implicit Scalable Encoding for SLAM},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2022}
}

Contact

Contact Zihan Zhu and Songyou Peng for questions, comments and reporting bugs.