OmniSeg3D-GS: Gaussian-Splatting based OmniSeg3D (CVPR2024)

Project Page | Arxiv Paper

OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning
Haiyang Ying¹, Yixuan Yin¹, Jinzhi Zhang¹, Fan Wang², Tao Yu¹, Ruqi Huang¹, Lu Fang¹
¹Tsinghua Univeristy ²Alibaba Group.

OmniSeg3D is a framework for multi-object, category-agnostic, and hierarchical segmentation in 3D, the original implementation is based on InstantNGP.

However, OmniSeg3D is not restricted by specific 3D representation. In this repo, we present a guassian-splatting based OmniSeg3D, which enjoys interactive 3D segmentation in real-time. The segmented objects can be saved as .ply format for further visualization and manipulation.

Installation

We follow the original environment setting of 3D Guassian-Splatting (SIGGRAPH 2023).

conda create -n gaussian_grouping python=3.8 -y
conda activate gaussian_grouping 

conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install plyfile==0.8.1
pip install tqdm scipy wandb opencv-python scikit-learn lpips

pip install submodules/diff-gaussian-rasterization
pip install submodules/simple-knn

Install SAM for 2D segmentation:

git clone https://github.com/facebookresearch/segment-anything.git
cd segment-anything
pip install -e .
mkdir sam_ckpt; cd sam_ckpt
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

Data Preparation:

We typically support data prepared as COLMAP format. For more details, please refer to the guidance in our NeRF-based implementation of OmniSeg3D.

Hierarchical Representation Generation

Run the sam model to get the hierarchical representation files.

python run_sam.py --ckpt_path {SAM_CKPT_PATH} --file_path {IMAGE_FOLDER}

After running, you will get three folders sam, masks, patches:

sam: stores the hierarchical representation as ".npz" files
masks and patches: used for visualization or masks quaility evaluation, not needed during training.

Ideal masks should include object-level masks and patches should contain part-level masks. We basically use the default parameter setting for SAM, but you can tune the parameters for customized datasets.

Training:

We train our models on a sinle NVIDIA RTX 3090 Ti GPU (24GB). Smaller scenes may require less memory. Typically, inference requires less than 8GB memory. We utilize a two-stage training strategy. See script/train_omni_360.sh as an example.

dataname=counter
gpu=1
data_path=root_path/to/the/data/folder/of/counter.

# --- Training Gaussian (Color and Density) --- #
CUDA_VISIBLE_DEVICES=${gpu} python train.py \
     -s ${data_path} \
     --images images_4 \
     -r 1 -m output/360_${dataname}_omni_1/rgb \
     --config_file config/gaussian_dataset/train_rgb.json \
     --object_path sam \
     --ip 127.0.0.2

# --- Training Semantic Feature Field --- #
CUDA_VISIBLE_DEVICES=${gpu} python train.py \
     -s ${data_path} \
     --images images_4 \
     -r 1 \
     -m output/360_${dataname}_omni_1/sem_hi \
     --config_file config/gaussian_dataset/train_sem.json \
     --object_path sam \
     --start_checkpoint output/360_${dataname}_omni_1/rgb/chkpnt10000.pth \
     --ip 127.0.0.2

# --- Render Views for Visualization --- #
CUDA_VISIBLE_DEVICES=${gpu} python render_omni.py \
    -m output/360_${dataname}_omni_1/sem_hi \
    --num_classes 256 \
    --images images_4

After specifying the custom information, you can run the file by execute at the root folder:

bash script/train_omni_360.sh

GUI Visualization and Segmentation

Modify the path of the trained point cloud. Then run render_omni_gui.py.

GUI options:

mode option: RGB, score map, and semantic map (you can visualize the consistent global semantic feature).
click mode: select object of interest
multi-click mode: select multiple points or objects
binary threshold: show binarized 2D images with the threshold
segment3d: segment the scene with the current threshold (saved .ply file can be found at the root dir)
reload: reload the whole scene
file selector: load another scene (point cloud)

Operations:

left drag: rotate
mid drag: pan
right click: choose point/objects

Acknowledgements

Thanks for the following project for their valuable contributions:

Citation

If you find this project helpful for your research, please consider citing the report and giving a ⭐.

@article{ying2023omniseg3d,
  title={OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning},
  author={Ying, Haiyang and Yin, Yixuan and Zhang, Jinzhi and Wang, Fan and Yu, Tao and Huang, Ruqi and Fang, Lu},
  journal={arXiv preprint arXiv:2311.11666},
  year={2023}
}

OceanYing/OmniSeg3D-GS