VOSVS: Video Object Segmentation-based Visual Servo

Contact: Brent Griffin (griffb at umich dot edu)

Paper

Video Object Segmentation-based Visual Servo Control and Object Depth Estimation on a Mobile Robot
Brent Griffin, Victoria Florence, and Jason J. Corso
IEEE Winter Conference on Applications of Computer Vision (WACV), 2020

Please cite our paper if you find it useful for your research.

@inproceedings{GrFlCoWACV20,
  author = {Griffin, Brent and Florence, Victoria and Corso, Jason J.},
  booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
  title = {Video Object Segmentation-based Visual Servo Control and Object Depth Estimation on a Mobile Robot},
  year = {2020}
}

Code

Source code for our video object segmentation-based framework is located in the /robot_exp folder.

Source code for annotating data and training OSVOS for segmentation is located in the /OSVOS_train folder.

Benchmark

VOSVS Visual Servo Control and Depth Estimation Benchmark.

Object Set	Support Height (m)	YCB Object	ClickBot	VOSVS
Tool	0.25	Power Drill	\	X
Tool	0.125	Marker	\	\
Tool	0.0	Padlock	X	\
Tool	0.25	Wood	X	\
Tool	0.125	Spring Clamp	\	\
Tool	0.0	Screwdriver	X	\
Food	0.25	Chips Can	X	X
Food	0.125	Potted Meat	X	X
Food	0.0	Plastic Banana	X	X
Food	0.25	Box of Sugar	X	X
Food	0.125	Tuna	X	\
Food	0.0	Gelatin	X	X
Kitchen	0.25	Mug	X	X
Kitchen	0.125	Softscrub	\
Kitchen	0.0	Skillet with Lid	\
Kitchen	0.25	Plate	X	X
Kitchen	0.125	Spatula	\
Kitchen	0.0	Knife	X	\
Shape	0.25	Baseball	X	\
Shape	0.125	Plastic Chain	X	\
Shape	0.0	Washer	\	\
Shape	0.25	Stacking Cup	X	X
Shape	0.125	Dice	\
Shape	0.0	Foam Brick	X	X

	Success Rate	(%VS / %DE)	100 / 67	83 / 42

The VOSVS Benchmark uses a single consecutive set of mobile robot trials using a single RGB camera. Visual Servo (VS) is a success ( \ ) if the robot moves within reach of an object for depth estimation (DE), which, in turn, is a success if the robot’s gripper closes on an object without collision (X). Please see our paper for more details. YCB Dataset objects are available here. The bins we use for varying depth were originally purchased here.

Is your technique missing although the paper and results are public? Let us know and we'll add it.

Method

WACV 2020 Oral Presentation: https://youtu.be/_SaMQjLxpZ8

HSR Segmenting Objects at Various Heights. HSR's grasp camera faces downward (left) and only collects RGB data for objects in the scene (top right). However, using active perception and video object segmentation (bottom right), HSR can locate and grasp a variety of objects in real time.

Depth Estimation of Sugar Box. Data are collected and processed in real time during the initial approach to the sugar box in the video demonstration.

Use

This code is available for non-commercial research purposes only.

Misc.

Robot Fine Motor Skills using VOSVS: https://youtu.be/4L6Q8sAjiCI