π[arXiv] β π[PDF]
This repository contains code for ACM MM 2024 paper:
RefMask3D: Language-Guided Transformer for 3D Referring Segmentation
Shuting He, Henghui Ding
ACM MM 2024
We adapt the codebase of Mask3D which provides a highly modularized framework for 3D instance Segmentation based on the MinkowskiEngine.
RefMask3D
βββ benchmark <- evaluation metric
βββ conf <- hydra configuration files
βββ datasets
β βββ preprocessing <- folder with preprocessing scripts
β βββ semseg.py <- ScanRefer dataset loader
β βββ utils.py
βββ models <- RefMask3D modules
βββ trainer
β βββ __init__.py
β βββ trainer.py <- train loop
βββ utils
βββ data
β βββprocessed <- folder for preprocessed ScanNet and ScanRefer
βββ scripts <- train scripts
βββ README.md
βββ saved <- folder that stores models and logs
The main dependencies of the project are the following:
python: 3.10.9
cuda: 11.7
torch: 1.13.1
You can set up a conda environment as follows, following Mask3D
conda create -n refmask3d python=3.10
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install "cython<3.0.0" && pip install --no-build-isolation pyyaml==6.0.1
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
pip install 'git+https://github.com/facebookresearch/detectron2.git@710e7795d0eeadf9def0e7ef957eea13532e34cf' --no-deps
pip install -r requirements.txt
pip install torch-scatter -f https://data.pyg.org/whl/torch-1.13.1+cu117.html
cd third_party
git clone --recursive "https://github.com/NVIDIA/MinkowskiEngine"
cd MinkowskiEngine
git checkout 02fc608bea4c0549b0a7b00ca1bf15dee4a0b228
python setup.py install --force_cuda --blas=openblas
cd ..
git clone https://github.com/ScanNet/ScanNet.git
cd ScanNet/Segmentator
git checkout 3e5726500896748521a6ceb81271b0f5b2c0e7d2
make
cd ../../pointnet2
python setup.py install
cd ../../
pip install pytorch-lightning==1.7.2
After installing the dependencies, You need to download ScanNet and ScanRefer datasets. Then, we preprocess the ScanNet datasets and download mask3d checkpoint.
mkdir -p checkpoints/scannet && cd checkpoints/scannet
wget https://omnomnom.vision.rwth-aachen.de/data/mask3d/checkpoints/scannet/scannet_val.ckpt
python -m datasets.preprocessing.scannet_preprocessing preprocess \
--data_dir="PATH_TO_RAW_SCANNET_DATASET" \
--save_dir="data/processed/" \
--git_repo="third_party/ScanNet/" \
--scannet200=False
Train and Test RefMask3D on the ScanNet dataset:
sh scripts/refmask3d.sh
Note: We train on a A6000 machine (48G) using 8 cards with 4 sample on each card, taking about 18 hours. If you use other cards, you may need to change batch size and learning rate.
TBD βοΈ Google Drive
Please consider to cite RefMask3D if it helps your research.
@inproceedings{RefMask3D,
title={{RefMask3D}: Language-Guided Transformer for 3D Referring Segmentation},
author={He, Shuting and Ding, Henghui},
booktitle={ACM MM},
year={2024}
}