SANeRF-HQ

SANeRF-HQ: Segment Anything for NeRF in High Quality [CVPR 2024].

This is the official implementation of SANeRF-HQ.

SANeRF-HQ: Segment Anything for NeRF in High Quality
Yichen Liu, Benran Hu, Yu-Wing Tai, Chi-Keung Tang
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

SANeRF-HQ Model Architecture

Set up

The code is based on this repo.

First, install requirement packages

pip install -r requirements.txt

Then, go to the HQ-SAM repo and install it

pip install segment-anything-hq

Also, you can build the extension (optional)

# install all extension modules
bash scripts/install_ext.sh

# if you want to install manually, here is an example:
cd raymarching
python setup.py build_ext --inplace # build ext only, do not install (only can be used in the parent directory)
pip install . # install to python path (you still need the raymarching/ folder, since this only install the built extension.)

Dataset

We use the dataset from Mip-NeRF 360, LERF, LLFF, 3DFRONT, Panoptic Lifting and Contrastive Lift. You can download the dataset from their website by clicking the following hyperlinks. Also we provide one example here.

To switch different dataset, simply change the value of the flag --data_type during training.

Mip-NeRF 360: --data_type=colmap.
LERF: --data_type=colmap.

Note: For LERF dataset, we do not obtain good NeRF reconstruction results by their camera poses (probably because of some hyper parameteres). Thus we use the colmap pose estimiation provided by this. Please following their instructions to run colmap first if you would like to test LERF. The corresponding scripts are also included in this repo.
LLFF: --data_type=llff. We use the data provided by Mip-NeRF 360.
3D-FRONT: --data_type 3dfront. We use the data provided by Instance NeRF
Panoptic Lifting / Contrastive Lift: --data_type=others.

For the evaluation masks we selected, you can download them here. Some datasets have ground truth segmentation (e.g. 3D-FRONT and Panoptic Lifting) so we directly use their annotation. For those without ground truth segmentation (e.g. Mip-NeRF 360), we randomly select some views and use this to obtain masks. Then, we pass the masks through CascadePSP for refinement if necessary.

Training

We provide some sample scripts to use our code. For the detailed description of each arguments, please refer to our code.

To train the RGB NeRF, run
```
bash scripts/train_rgb_nerf.sh
```
Then run the following script to obtain feature container.
```
bash scripts/train_sam_nerf.sh
```
You can change the container type by the flag--feature_container.
With the feature container, you can decode the object mask per image.
```
bash scripts/decode.sh
```
In decoding, 3D points are required as input. To obtain 3D points, you can project 2D points onto 3D (The script is not provided but you can find the corresponding code in test_step in nerf/train.py) or use the GUI to select points.
To use the GUI, you should add --gui or you can run
```
bash scripts/gui.sh
```
Right click to select point and click use negative labels check box to add points with negative labels. After selection, click save 3D points to save those points in a json file.
To train object field, run
```
bash scripts/train_obj_nerf.sh
```
Simply set ray_pair_rgb_iter > iter if you think that the ray pair rgb loss is slow or does not help in some cases.

Evaluation

To evaluate our results, you can run scripts/test_obj_nerf.sh. You can add --use_default_intrinsics in the test script to render mask with the default intrinsics. You can be download the evaluation views here

Other Results

In our paper, we demonstrate the potential of our pipeline to achieve various segmentation tasks. Here are some instructions about how we get those results.

Text-prompt Segmentation

We use Grounding-DINO to generate the bounding box based on text and then use the bounding box as prompt for SAM to generate mask.

Auto-segmentation and Dynamic Segmentation

We use DEVA for a sequence of images in video.

For static scene, you can first render a video from NeRF. You can utilize the 'save trajectory' function in GUI to store a sequence of camera poses. Click start track to start recoding the camera trajectory and click save trajectory to store it. Then put those frames into DEVA to help you obtain automatic segmentation results. Finally, you can use the code to train the object field. Remember to change --n_inst in multi-instance cases

Acknowledgement

SAM and HQ-SAM

@article{kirillov2023segany,
    title={Segment Anything},
    author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
    journal={arXiv:2304.02643},
    year={2023}
}

@inproceedings{sam_hq,
    title={Segment Anything in High Quality},
    author={Ke, Lei and Ye, Mingqiao and Danelljan, Martin and Liu, Yifan and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
    booktitle={NeurIPS},
    year={2023}
}

torch-ngp

@misc{torch-ngp,
    Author = {Jiaxiang Tang},
    Year = {2022},
    Note = {https://github.com/ashawkey/torch-ngp},
    Title = {Torch-ngp: a PyTorch implementation of instant-ngp}
}

CascadePSP

@inproceedings{cheng2020cascadepsp,
  title={{CascadePSP}: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement},
  author={Cheng, Ho Kei and Chung, Jihoon and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2020}
}

OpenMMLab Playground: https://github.com/open-mmlab/playground

Citation

If you find this repo or our paper useful, please ⭐ this repository and consider citing 📝:

@article{liu2023sanerf,
  title={SANeRF-HQ: Segment Anything for NeRF in High Quality},
  author={Liu, Yichen and Hu, Benran and Tang, Chi-Keung and Tai, Yu-Wing},
  journal={arXiv preprint arXiv:2312.01531},
  year={2023}
}

lyclyc52/SANeRF-HQ