EgoHOS

Project Page | Paper | Bibtex

Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications
European Conference on Computer Vision (ECCV), 2022
Lingzhi Zhang*, Shenghao Zhou*, Simon Stent, Jianbo Shi (* indicates equal contribution)

Our main goal is to provide a tool for better hand-object segmentation on the in-the-wild egocentric videos.

Prerequisites

Linux
Python 3
NVIDIA GPU + CUDA CuDNN

Table of Contents:

Setup - download pretrained models and resources
Datasets - download our egocentric hand-object segmentation datasets
Checkpoints - download the checkpoints for all our models
Inference on Images - quick usage on images
Inference on Videos - quick usage on videos
Other Resources - other resources used in our papers

Setup

Clone this repo:

git clone https://github.com/owenzlz/EgoHOS

Install dependencies:

pip install -r requirements.txt
pip install -U openmim
mim install mmcv-full==1.6.0
cd mmsegmentation
pip install -v -e .

For more information, please refer to MMSegmentation: https://mmsegmentation.readthedocs.io/en/latest/

Datasets

Download our dataset using the following command line.

bash download_datasets.sh

After downloading, the dataset is structured as follows:

- [egohos dataset root]
    |- train
        |- image
        |- label
        |- contact
    |- val 
        |- image
        |- label
        |- contact
    |- test_indomain
        |- image
        |- label
        |- contact
    |- test_outdomain
        |- image
        |- label
        |- contact

In each label image, the category ids are referred as below. In the contact labels, 'ones' indicate the dense contact region.

0 -> background
1 -> left hand
2 -> right hand
3 -> 1st order interacting object by left hand
4 -> 1st order interacting object by right hand
5 -> 1st order interacting object by both hands
6 -> 2nd order interacting object by left hand
7 -> 2nd order interacting object by right hand
8 -> 2nd order interacting object by both hands

Checkpoints

Download checkponts and config files:

bash download_checkpoints.sh

Inference on Images

Let's first download a few test images for running the demo:

bash download_testimages.sh

Depending on the application scenarios, you may want to use one of these commands to generate the segmentation predictions. Please modify the image directory paths in the bash file if needed. The backen segmentation model is Swin-L backbone with UPerNet head.

The default of the bash commands run on the images in "./testimages/images", and the results are saved in "./testimages" folder. If you wish to test on your own images, you may either put your images into "./testimages/images" folder or change directories in the bash files.

Predict two hands, contact boundary, and interacting objects (1st order) sequentially.

cd mmsegmentation # if you are not in this directory
bash pred_all_obj1.sh

Predict two hands, contact boundary, and interacting objects (1st and 2nd orders) sequentially.

cd mmsegmentation # if you are not in this directory
bash pred_all_obj2.sh

If you only want to predict only hand/contact segmentation, or want to use each module separately, see the commands below.

Predict only the left and right hands.

cd mmsegmentation # if you are not in this directory
bash pred_twohands.sh

Predict the dense contact boundary.

cd mmsegmentation # if you are not in this directory
bash pred_cb.sh

Predict the (1st order) interacting objects.

cd mmsegmentation # if you are not in this directory
bash pred_obj1.sh

Predict the (both 1st and 2nd orders) interacting objects.

cd mmsegmentation
bash pred_obj2.sh

Inference on Videos

Let's first download a few test videos for running the demo:

bash download_testvideos.sh

Predict hands and (1st order) interacting objects.

cd mmsegmentation # if you are not in this directory
bash pred_obj1_video.sh

Predict hands and (1st and 2nd orders) interacting objects.

cd mmsegmentation # if you are not in this directory
bash pred_obj2_video.sh

Other Resouces

We used other resources for the application section, i.e. mesh reconstruction. Please refer to below:

Image Inpainting - LaMa: https://github.com/saic-mdal/lama
Video Inpainting - Flow-edge Guided Video Completion: https://github.com/vt-vl-lab/FGVC
Mesh Reconstruction of Hand-Object Interaction: https://github.com/hassony2/homan
Video Recognition - SlowFast Newtork: https://github.com/epic-kitchens/epic-kitchens-slowfast

If you wish to generate higher quality mask, you may consider using mask refinement model, i.e. Cascade PSP: https://github.com/hkchengrex/CascadePSP

Citation

If you use this code for your research, please cite our paper:

@inproceedings{zhang2022fine,
  title={Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications},
  author={Zhang, Lingzhi and Zhou, Shenghao and Stent, Simon and Shi, Jianbo},
  booktitle={European Conference on Computer Vision},
  pages={127--145},
  year={2022},
  organization={Springer}
}

beasteers/EgoHOS