/SOLE

Official code of "Segment any 3D Object with Language"

Primary LanguagePythonMIT LicenseMIT

Segment Any 3D Object with Language

Seungjun Lee1* ยท Yuyang Zhao2* ยท Gim Hee Lee2
1Korea University ยท 2National University of Singapore
*equal contribution

arXiv 2024

PyTorch Lightning Config: Hydra

Logo

SOLE is highly generalizable and can segment corresponding instances with various language instructions, including but not limited to visual questions, attributes description, and functional description.


Table of Contents
  1. TODO
  2. Installation
  3. Data Preparation
  4. Weights
  5. Download data and weight
  6. Training and Testing
  7. Acknowledgement
  8. Citation

News:

  • [2024/04/20] Code is released ๐Ÿ’ก.
  • [2024/05/02] Pre-processed data and weights are released. Now you can train and evaluate our SOLE ๐Ÿ‘๐Ÿป.

TODO

  • Release the code
  • Release the preprocessed data and weights
  • Release the evaluation code for Replica dataset
  • Release the pre-processed data and precomputed features for Replica dataset

Installation

Dependencies ๐Ÿ“

The main dependencies of the project are the following:

python: 3.10.9
cuda: 11.3

You can set up a conda environment as follows

export TORCH_CUDA_ARCH_LIST="6.0 6.1 6.2 7.0 7.2 7.5 8.0 8.6"

conda env create -f environment.yml

conda activate sole

pip3 install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip3 install torch-scatter -f https://data.pyg.org/whl/torch-1.12.1+cu113.html
pip3 install 'git+https://github.com/facebookresearch/detectron2.git@710e7795d0eeadf9def0e7ef957eea13532e34cf' --no-deps

mkdir third_party
cd third_party

git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
python setup.py install --blas_include_dirs=${CONDA_PREFIX}/include --blas=openblas

cd ../../pointnet2
python setup.py install

cd ../../
pip3 install pytorch-lightning==1.7.2
pip3 install open-clip-torch

Data Preparation

We provide the pre-processed 3D data and precomputed features for the training and evaluation which are listed below:

  • Pre-processed 3D data
  • Precomputed per-point CLIP features
  • Precomputed features of MCA and MEA

You can download above data with following Download data and weight. We also provide the specific data configuration in here to help your understanding for our pre-processed data.

Weights

For the stable training, we employ a two-stage training process:

  1. Pretrain the backbone with only using mask-annotations.
  2. Train the mask decoder while backbone is fixed. Mask annotations and three types of associations are used for the training. (See the original paper for the details.)

For the training, we provide pretrained backbone weights for ScanNet and ScanNet200 datasets listed below:

For the evaluation, we provide the official weight of SOLE for ScanNet and ScanNet200 datasets listed below:

You can download all of the weights for the pretrained backbone and SOLE with following Download data and weight.

Download data and weight

We provide the python script that download all of the pre-processed data and weights we mentioned above. You can run the command below:

python download_data.py

Once you run the above command, the downloaded files must be automatically located to the corresponding path. Refer to the file structure below.

โ”œโ”€โ”€ backbone_checkpoint
โ”‚   โ”œโ”€โ”€ backbone_scannet.ckpt        <- Backbone weights for ScanNet
โ”‚   โ””โ”€โ”€ backbone_scannet200.ckpt     <- Backobne weights for ScanNet200
โ”‚
โ”œโ”€โ”€ checkpoint
โ”‚   โ”œโ”€โ”€ scannet.ckpt        <- Official weights for ScanNet
โ”‚   โ””โ”€โ”€ scannet200.ckpt     <- Official weights for ScanNet200
โ”‚ 
โ”œโ”€โ”€ data
โ”‚   โ””โ”€โ”€ preprocessed
โ”‚       โ”œโ”€โ”€ scannet                   <- Preprocessed ScanNet data
โ”‚       โ””โ”€โ”€ scannet200                <- Preprocessed ScanNet200 data
โ”‚   
โ”œโ”€โ”€ openvocab_supervision
โ”‚   โ”œโ”€โ”€ openseg     
โ”‚   โ”‚   โ””โ”€โ”€ scannet                   <- Precomputed per-point CLIP features for ScanNet
โ”‚   โ”‚       โ”œโ”€โ”€ scene0000_00_0.pt
โ”‚   โ”‚       โ”œโ”€โ”€ scene0000_01_0.pt
โ”‚   โ”‚       โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ scannet_mca                   <- Precomputed features of MCA for ScanNet
โ”‚   โ”‚   โ”œโ”€โ”€ scene0000_00.pickle
โ”‚   โ”‚   โ”œโ”€โ”€ scene0000_01.pickle
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ scannet_mea                   <- Precomputed features of MEA for ScanNet
โ”‚   โ”‚   โ”œโ”€โ”€ scene0000_00.pickle
โ”‚   โ”‚   โ”œโ”€โ”€ scene0000_01.pickle
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ scannet200_mca                <- Precomputed features of MCA for ScanNet200
โ”‚   โ”‚   โ”œโ”€โ”€ scene0000_00.pickle
โ”‚   โ”‚   โ”œโ”€โ”€ scene0000_01.pickle
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ””โ”€โ”€ scannet200_mea                <- Precomputed features of MEA for ScanNet200
โ”‚       โ”œโ”€โ”€ scene0000_00.pickle
โ”‚       โ”œโ”€โ”€ scene0000_01.pickle
โ”‚       โ””โ”€โ”€ ...

If you successfully download all of the given files, you are now ready to train and evaluate the model. Check the training and evaluation command in Training and Testing section to run the SOLE.

Training and Testing

Train the SOLE on the ScanNet dataset.

bash scripts/scannet/scannet_train.sh

Train the SOLE on the ScanNet200 dataset.

bash scripts/scannet200/scannet200_train.sh

Evaluate the SOLE on the ScanNet dataset.

bash scripts/scannet/scannet_val.sh

Evaluate the SOLE on the ScanNet200 dataset.

bash scripts/scannet200/scannet200_val.sh

If you want to use wandb during the training, set the workspace in conf/config_base_instance_segmentation.yaml file to your wandb workspace name. And run the command below before running the training/testing command:

wandb enabled

If you want to turn off the wandb, run the command below before running the training/testing command:

wandb disabled

Acknowledgement

We build our code on top of the Mask3D. We sincerely thank to Mask3D team for the amazing work and well-structured code. Furthermore, our work is inspired a lot from the following works:

We express our gratitude for their exceptional contributions.

Citation

If you find our code or paper useful, please cite

@article{lee2024segment,
      title = {Segment Any 3D Object with Language}, 
      author = {Lee, Seungjun and Zhao, Yuyang and Lee, Gim Hee},
      year = {2024},
      journal   = {arXiv preprint arXiv:2404.02157},
}