/OccNeRF

Code of "OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields".

Primary LanguagePythonApache License 2.0Apache-2.0

OccNeRF

Project Page | Paper | Data

OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields

Chubin Zhang*, Juncheng Yan* Yi Wei*, Jiaxin Li, Li Liu, Yansong Tang, Yueqi Duan, Jiwen Lu

Updates:

  • 🔔 2023/12/17 Generated 2D semantic labels release.
  • 🔔 2023/12/15 Initial code and paper release.

🕹 Demos

Demos are a little bit large; please wait a moment to load them. If you cannot load them or feel them blurry, you can click the hyperlink of each demo for the full-resolution raw video.

📝 Introduction

In this paper, we propose an OccNeRF method for self-supervised multi-camera occupancy prediction. Different from bounded 3D occupancy labels, we need to consider unbounded scenes with raw image supervision. To solve the issue, we parameterize the reconstructed occupancy fields and reorganize the sampling strategy. The neural rendering is adopted to convert occupancy fields to multi-camera depth maps, supervised by multi-frame photometric consistency. Moreover, for semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.

💡 Method

Method Pipeline:

We first use a 2D backbone to extract multi-camera features, which are lifted to 3D space to get volume features with interpolation. The parameterized occupancy fields are reconstructed to describe unbounded scenes. To obtain the rendered depth and semantic maps, we perform volume rendering with our reorganized sampling strategy. The multi-frame depths are supervised by photometric loss. For semantic prediction, we adopted pretrained Grounded-SAM with prompts cleaning. The green arrow indicates supervision signals.

🔧 Installation

Clone this repo and install the dependencies:

git clone --recurse-submodules https://github.com/LinShan-Bin/OccNeRF.git
cd OccNeRF
conda create -n occnerf python=3.8
conda activate occnerf
conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt

Our code is tested with Python 3.8, PyTorch 1.9.1 and CUDA 11.3 and can be adapted to other versions of PyTorch and CUDA with minor modifications.

🏗 Dataset Preparation

  1. Download nuScenes V1.0 full dataset data from nuScenes and link the data folder to ./data/nuscenes/nuscenes/.

  2. Download the ground truth occupancy labels from Occ3d and unzip the gts.tar.gz to ./data/nuscenes/gts. Note that we only use the 3d occupancy labels for validation.

  3. Generate the ground truth depth maps for validation:

    python tools/export_gt_depth_nusc.py
  4. Download the generated 2D semantic labels from semantic_labels and extract the data to ./data/nuscenes/. We recommend that you use pigz to speed up the process.

  5. Download the pretrained weights of our model from Checkpoints and move them to ./ckpts/.

  6. (Optional) If you want to generate the 2D semantic labels by yourself, please refer to the README.md in GroundedSAM_OccNeRF. The dataset index pickle file nuscenes_infos_train.pkl is from SurroundOcc and should be placed under ./data/nuscenes/.

The Final folder structure should be like:

OccNeRF/
├── ckpts/
│   ├── nusc-depth/
│   │   ├── encoder.pth
│   │   ├── depth.pth
│   ├── nusc-sem/
│   │   ├── encoder.pth
│   │   ├── depth.pth
├── data/
│   ├── nuscenes/
│   │   ├── nuscenes/
│   │   │   ├── maps/
│   │   │   ├── samples/
│   │   │   ├── sweeps/
│   │   │   ├── v1.0-trainval/
│   │   ├── gts/
│   │   ├── nuscenes_depth/
│   │   ├── nuscenes_semantic/
│   │   ├── nuscenes_infos_train.pkl
├── ...

🚀 Quick Start

Training

Train OccNeRF without semantic supervision:

python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-depth.txt

In order to train the full model, you need at least 80 GB GPU memory. If you have less GPU memory (e.g., 40 GB), you can train with a single frame (set auxiliary_frame = False in the config file). See section 4.4 in the paper for the ablation study. Evaluation can be done with 24 GB GPU memory.

Train OccNeRF with semantic supervision:

python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-sem.txt

Evaluation

Evaluate the depth estimation:

python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-depth.txt --eval_only --load_weights_folder ckpts/nusc-depth

Evaluate the occupancy prediction:

python -m torch.distributed.launch --nproc_per_node=8 run.py --config configs/nusc-sem.txt --eval_only --load_weights_folder ckpts/nusc-sem

Visualization

Visualize the depth estimation:

python tools/export_vis_data.py  # You can modify this file to choose scenes you want to visualize. Otherwise, all validation scenes will be visualized.
python -m torch.distributed.launch --nproc_per_node=8 run_vis.py --config configs/nusc-depth.txt --load_weights_folder ckpts/nusc-depth --log_dir your_log_dir
python gen_scene_video.py scene_folder_generated_by_the_above_command

🙏 Acknowledgement

Many thanks to these excellent projects:

📃 Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{chubin2023occnerf, 
      title   = {OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields}, 
      author  = {Chubin Zhang and Juncheng Yan and Yi Wei and Jiaxin Li and Li Liu and Yansong Tang and Yueqi Duan and Jiwen Lu},
      journal = {arXiv preprint arXiv:2312.09243},
      year    = {2023}
}