OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning
Haiyang Ying1, Yixuan Yin1, Jinzhi Zhang1, Fan Wang2, Tao Yu1, Ruqi Huang1, Lu Fang1
1Tsinghua Univeristy 2Alibaba Group.
Towards segmenting everything in 3D all at once, we propose an omniversal 3D segmentation method (a), which takes as input multi-view, inconsistent, class-agnostic 2D segmentations, and then outputs a consistent 3D feature field via a hierarchical contrastive learning framework. This method supports hierarchical segmentation (b), multi-object selection (c), and holistic discretization (d) in an interactive manner.
replica_github2.1.mp4
For more demos, please visit our project page: OmniSeg3D.
- 2024/01/14: We release the original version of OmniSeg3D. Try and play with it now!
- 2024/03/26: We release OmniSeg3D-GS as an adaptation of 3D Gaussian Splatting, check out it now!
NOTE: Our project is implemented based on the ngp_pl project and the requirements are the same as ngp_pl except for the SAM
and a customized CUDA extension
.
- OS: Ubuntu 20.04
- NVIDIA GPU with Compute Compatibility >= 75 and memory > 8GB (Tested with a single RTX 2080 Ti and RTX 3090), CUDA 11.3 (might work with older version)
-
Clone this repo by:
git clone https://github.com/THU-luvision/OmniSeg3D.git
-
Create a conda environment and activate it, Python>=3.8 (installation via anaconda is recommended.
conda create -n omniseg3d python=3.8 conda activate omniseg3d
-
pytorch
,pytorch-lightning=1.9.3
,pytorch-scatter
conda install pytorch==1.11.0 torchvision==0.12.0 cudatoolkit=11.3 -c pytorch conda install pytorch-lightning=1.9.3 conda install pytorch-scatter -c pyg
-
tinycudann
: following the official instruction (pytorch extension). NOTE: If you want to install it on server with local installed CUDA, you need to specify the CUDA path ascmake . -B build -DCMAKE_CUDA_COMPILER=/usr/local/cuda-11.3/bin/nvcc
instead of 'cmake . -B build'.git clone --recursive https://github.com/nvlabs/tiny-cuda-nn cd tiny-cuda-nn/bindings/torch python setup.py install
-
apex
: following the official instruction. NOTE: Error may occur due to the recent official commit, trygit checkout 2386a912164b0c5cfcd8be7a2b890fbac5607c82
to resolve the problem.git clone https://github.com/NVIDIA/apex cd apex # if pip >= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./ # otherwise pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
-
SAM
for segmentation:git clone https://github.com/facebookresearch/segment-anything.git cd segment-anything pip install -e . mkdir sam_ckpt; cd sam_ckpt wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
-
Other python requirements:
pip install -r requirements.txt
-
Cuda Extension: Upgrade
pip
to >= 22.1 and run:pip install models/csrc/
Run the sam model to get the hierarchical representation files.
python run_sam.py --ckpt_path {SAM_CKPT_PATH} --file_path {IMAGE_FOLDER}
After running, you will get three folders sam
, masks
, patches
:
sam
: stores the hierarchical representation as ".npz" filesmasks
andpatches
: used for visualization or masks quaility evaluation, not needed during training.
Ideal masks
should include object-level masks and patches
should contain part-level masks. We basically use the default parameter setting for SAM, but you can tune the parameters for customized datasets.
We provide some data sample (replica_room_0, 360_counter, llff_flower), you can download them for model trainning.
NOTE: Folder "sam", "masks", and "patches" should be generated with run_sam.py
data
├── 360_v2 # Link: https://jonbarron.info/mipnerf360/
│ └── [bicycle|bonsai|counter|garden|kitchen|room|stump]
│ ├── [sparse/0] (colmap results)
│ └── [images|images_2|images_4|images_8|sam|masks|patches]
│
├── nerf_llff_data # Link: https://drive.google.com/drive/folders/14boI-o5hGO9srnWaaogTU5_ji7wkX2S7
│ └── [fern|flower|fortress|horns|leaves|orchids|room|trex]
│ ├── [sparse/0] (colmap results)
│ └── [images|images_2|images_4|images_8]
│
└── replica_data # Link: https://github.com/ToniRV/NeRF-SLAM/blob/master/scripts/download_replica.bash
└── [office_0|room_0|...]
├── transforms_train.json
└── [rgb|depth(optional)|sam|masks|patches]
We recommend a two-stage training strategy for stable convergence, which means we train for color and density field first and then for semantic field.
- Before running: please specify the information in the config file (like
run_replica.sh
). More options can be found inopt.py
and you can them adjusted in config file.
# --- Edit the config file scripts/run_replica.sh
root_dir=/path/to/data/folder/of/the/scene
exp_name=experiment_name
dataset_name=dataset_type # "colmap", "replica", and you can easily specify new dataset type
- Stage 1: color and density field optimization
CUDA_VISIBLE_DEVICES=0 opt=train_rgb bash scripts/run_replica.sh
- Stage2: semantic field optimization
CUDA_VISIBLE_DEVICES=0 opt=train_sem bash scripts/run_replica.sh
We provide GUI (based on DearPyGUI) for interactive segmentation.
- Stage1: color and density field visualization
CUDA_VISIBLE_DEVICES=0 opt=show_rgb bash scripts/run_replica.sh
- Stage2: semantic field visualization and segmentation
CUDA_VISIBLE_DEVICES=0 opt=show_sem bash scripts/run_replica.sh
Here are some functional instructions for interactive segmentation in GUI:
- The view point can be changed by dragging the mouse on the screen
- Left click
clickmode
button to start segmentation mode:Single-click mode
: right click the region of interest, the object or part will be highlighted, and the score map will show the similarity between the selected pixel and other rendered pixels.Multi-click mode
: choosemulti-clickmode
button, then you can select multiple pixels on the screen by right click them.Similarity Threshold
: drag the pin ofScoreThres
, then the unselected regions will be darkened.Binarization
: left click thebinary threshold
button a binary mask will be applied to the RGB image via the chosen similarity threshold.
We provide trained model for replica room_0, you can use it for GUI visulization and interactive segmentation. This sample also reveals the output organization. It is recommended to put the unzipped "results" folder under the root_dir of OmniSeg3D for minimum code modification.
360_github2.1.mp4
Comparison with SA3D
360_counter_comp.1.mp4
- Release mesh-based implementation;
Thanks for the following project for their valuable contributions:
If you find this project helpful for your research, please consider citing the report and giving a ⭐.
@article{ying2023omniseg3d,
title={OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning},
author={Ying, Haiyang and Yin, Yixuan and Zhang, Jinzhi and Wang, Fan and Yu, Tao and Huang, Ruqi and Fang, Lu},
journal={arXiv preprint arXiv:2311.11666},
year={2023}
}