/RefMask3D

[ACM MM-2024] RefMask3D: Language-Guided Transformer for 3D Referring Segmentation

Primary LanguagePython

RefMask3D: Language-Guided Transformer for 3D Referring Segmentation

πŸ“„[arXiv]   πŸ“„[PDF]

This repository contains code for ACM MM 2024 paper:

RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
Shuting He, Henghui Ding
ACM MM 2024

Code structure

We adapt the codebase of Mask3D which provides a highly modularized framework for 3D instance Segmentation based on the MinkowskiEngine.

RefMask3D
β”œβ”€β”€ benchmark                     <- evaluation metric
β”œβ”€β”€ conf                          <- hydra configuration files
β”œβ”€β”€ datasets
β”‚   β”œβ”€β”€ preprocessing             <- folder with preprocessing scripts
β”‚   β”œβ”€β”€ semseg.py                 <- ScanRefer dataset loader
β”‚   └── utils.py
β”œβ”€β”€ models                        <- RefMask3D modules
β”œβ”€β”€ trainer
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── trainer.py                <- train loop
└── utils
β”œβ”€β”€ data
β”‚   └──processed                  <- folder for preprocessed ScanNet and ScanRefer					
β”œβ”€β”€ scripts                       <- train scripts
β”œβ”€β”€ README.md
└── saved                         <- folder that stores models and logs

Dependencies πŸ“

The main dependencies of the project are the following:

python: 3.10.9
cuda: 11.7
torch: 1.13.1

You can set up a conda environment as follows, following Mask3D

conda create -n refmask3d python=3.10 
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install "cython<3.0.0" && pip install --no-build-isolation pyyaml==6.0.1
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
pip install 'git+https://github.com/facebookresearch/detectron2.git@710e7795d0eeadf9def0e7ef957eea13532e34cf' --no-deps
pip install -r requirements.txt
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.1+cu117.html

cd third_party

git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine"
cd MinkowskiEngine
git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
python setup.py install --force_cuda --blas=openblas

cd ..
git clone https://github.com/ScanNet/ScanNet.git
cd ScanNet/Segmentator
git checkout 3e5726500896748521a6ceb81271b0f5b2c0e7d2
make

cd ../../pointnet2
python setup.py install

cd ../../
pip install pytorch-lightning==1.7.2

Data preprocessing πŸ”¨

After installing the dependencies, You need to download ScanNet and ScanRefer datasets. Then, we preprocess the ScanNet datasets and download mask3d checkpoint.

mkdir -p checkpoints/scannet && cd checkpoints/scannet
wget https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/scannet/scannet_val.ckpt

python -m datasets.preprocessing.scannet_preprocessing preprocess \
--data_dir="PATH_TO_RAW_SCANNET_DATASET" \
--save_dir="data/processed/" \
--git_repo="third_party/ScanNet/" \
--scannet200=False

Training and testing πŸš†

Train and Test RefMask3D on the ScanNet dataset:

sh scripts/refmask3d.sh

Note: We train on a A6000 machine (48G) using 8 cards with 4 sample on each card, taking about 18 hours. If you use other cards, you may need to change batch size and learning rate.

Trained checkpoints πŸ’Ύ

TBD ☁️ Google Drive

BibTeX πŸ™

Please consider to cite RefMask3D if it helps your research.

@inproceedings{RefMask3D,
  title={{RefMask3D}: Language-Guided Transformer for 3D Referring Segmentation},
  author={He, Shuting and Ding, Henghui},
  booktitle={ACM MM},
  year={2024}
}