This is the official PyTorch implementation for SupeRGB-D: Zero-shot Instance Segmentation in Cluttered Indoor Environments, IEEE RAL paper.
In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner. We introduce a zero-shot split for Tabletop Objects Dataset (TOD-Z) to enable this study and present a method that uses annotated objects to learn the ``objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments.
Our method, SupeRGB-D, groups pixels into small patches based on geometric cues and learns to merge the patches in a deep agglomerative clustering fashion. An overview of our method is illustrated here:
- Python environment using env.yml.
git clone https://github.com/evinpinar/supergb-d.git
cd supergb-d
conda env create --file env.yml
conda activate supergbd
- Install the TOD dataset from original repo and the TOD-Z ids from here. Preprocess the data to extract superpixels, training features and generate the ground truth. Fix the datapaths according to your local configuration.
python data/preprocess_data_full.py # set up the number of threads according to your cpu
# optionally, you can also only run data/process.py for single thread.
- Train the merger network.
python src/model_train.py --cfg configs/run_local.yaml
- Test the trained model.
python src/model_eval.py --cfg configs/run_local.yaml
An example checkpoint is provided here which is based on 128 super-pixels and trained without DINO features.
If you find this code helpful, please consider citing:
@ARTICLE{ornek23,
author={{\"O}rnek, Evin P{\i}nar and Krishnan, Aravindhan K and Gayaka, Shreekant and Kuo, Cheng-Hao and Sen, Arnie and Navab, Nassir and Tombari, Federico},
journal={IEEE Robotics and Automation Letters},
title={SupeRGB-D: Zero-Shot Instance Segmentation in Cluttered Indoor Environments},
year={2023},
volume={8},
number={6},
pages={3709-3716},
doi={10.1109/LRA.2023.3271527}}
This repository contains code parts that are based on UOIS-Net and Davis-2017. We thank the authors for making their code available.