/MVTrans

Implementation of MVTrans: Multi-view Perception to See Transparent Objects

Primary LanguagePython

MVTrans: Multi-view Perception to See Transparent Objects (ICRA2023)

Paper | Project | Video

This repo contains the official implementation of the paper "MVTrans: Multi-view Perception to See Transparent Objects".

Introduction

Transparent object perception is a crucial skill for applications such as robot manipulation in household and laboratory settings. Existing methods utilize RGB-D or stereo inputs to handle a subset of perception tasks including depth and pose estimation. However transparent object perception remains to be an open problem. In this paper, we forgo the unreliable depth map from RGB-D sensors and extend the stereo based method. Our proposed method, MVTrans, is an end-to-end multi-view architecture with multiple perception capabilities, including depth estimation, segmentation, and pose estimation. Additionally, we establish a novel procedural photo-realistic dataset generation pipeline and create a large-scale transparent object detection dataset, Syn-TODD, which is suitable for training networks with all three modalities, RGB-D, stereo and multi-view RGB.

Installation

Setup a conda environment, install required packages, and download the repo:

conda create -y --prefix ./env python=3.8
./env/bin/python -m pip install -r requirements.txt
git clone https://github.com/ac-rad/MVTrans.git

Weights & Biases (wandb) is used to log and visualize training results. Please follow the instruction to setup wandb. To appropriately log results to cloud, insert your wandb login key in net_train_multiview.py. Otherwise, to log results locally, run the following command and access results at localhost:

wandb offline

Dataset

Our synthetic transparent object detection dataset (Syn-TODD) can be downloaded at here.

Training

To train MVTrans from scratch, modify the data path and output directory in configuration files under config/, and then run:

./runner.sh net_train_multiview.py @config/net_config_blender_multiview_{NUM_OF_VIEW}_train.txt

Evaluation

To run the evaluation, need to change modify the data path and output directory in configuration files under config/, and then run:

./runner.sh net_train_multiview.py @config/net_config_blender_multiview_{NUM_OF_VIEW}_eval.txt

Inference

To run the inference, launch jupyter notebook and run inference.ipynb.

Citation

Please cite our paper:

@misc{wang2023mvtrans,
      title={MVTrans: Multi-View Perception of Transparent Objects}, 
      author={Yi Ru Wang and Yuchi Zhao and Haoping Xu and Saggi Eppel and Alan Aspuru-Guzik and Florian Shkurti and Animesh Garg},
      year={2023},
      eprint={2302.11683},
      archivePrefix={arXiv},
      primaryClass={cs.RO}
}

Reference

Our MVTrans architecture is built based on SimNet and ESTDepth.