/hoi-forecast

[CVPR 2022] Joint hand motion and interaction hotspots prediction from egocentric videos

Primary LanguagePython

HOI-Forecast

Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos (CVPR 2022)

Given observation frames of the past, we predict future hand trajectories (green and red lines) and object interaction hotspots (heatmaps) in egocentric view. We genearte training data automatically and use this data to train an Object-Centric Transformer (OCT) model for prediction.

Installation

  • Clone this repository:
    git clone https://github.com/stevenlsw/hoi-forecast
    cd hoi-forecast
  • Python 3.6 Environment:
    conda env create -f environment.yaml
    conda activate fhoi

Quick training data generation

Official Epic-Kitchens Dataset looks the same as assets/EPIC-KITCHENS, rgb frames needed for the demo has been pre-downloaded in assets/EPIC-KITCHENS/P01/rgb_frames/P01_01.

  • Download Epic-Kitchens 55 Dataset annotations and save in assets folder

  • Download hand-object detections below

    link=https://data.bris.ac.uk/datasets/3l8eci2oqgst92n14w2yqi5ytu/hand-objects/P01/P01_01.pkl
    wget -P assets/EPIC-KITCHENS/P01/hand-objects $link
  • Run python demo_gen.py and results [png, pkl] are stored in figs, you should visualize the result

  • For more generated training labels, please visit google drive and run python example.py.

Evaluation on EK100

We maunally collect the hand trajectories and interaction hotspots for evaluation. We pre-extract the input videos features.

  • Download the processed files (include collected labels, pre-extracted features, and dataset partitions, 600 MB) and unzipped. You will get the stucture like:

    hoi-forecast
    |-- data 
    |   |-- ek100
    |   |   |-- ek100_eval_labels.pkl
    |   |   |-- video_info.json
    |   |   |-- labels
    |   |   |   |-- label_303.pkl
    |   |   |   |-- ...
    |   |   |-- feats
    |   |   |   |-- data.lmdb (RGB)
    |-- common
    |   |-- epic-kitchens-55-annotations
    |   |-- epic-kitchens-100-annotations
    |   |-- rulstm
    
  • Download pretrained models on EK100 and the stored model path is refered as $resume.

  • Install PyTorch and dependencies by the following command:

    pip install -r requirements.txt
  • Evaluate future hand trajectory

    python traineval.py --evaluate --ek_version=ek100 --resume={path to the model} --traj_only
  • Evaluate future interaction hotspots

    python traineval.py --evaluate --ek_version=ek100 --resume={path to the model}
  • Results should like:

    Hand Trajectory Interaction Hotspots
    ADE ↓ FDE ↓ SIM ↑ AUC-J ↑ NSS ↑
    0.12 0.11 0.19 0.69 0.72

Training

  • Extract per-frame features of training set similar to RULSTM and store them in data/ek100/feats/ek100.lmdb, the key-value pair likes

    fname = 'P01/rgb_frames/P01_01/frame_0000000720.jpg'
    env[fname.encode()] = result_dict # extracted feature results
  • Start training

    python traineval.py --ek_version=ek100
    

Citation

@inproceedings{liu2022joint,
  title={Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos},
  author={Liu, Shaowei and Tripathi, Subarna and Majumdar, Somdeb and Wang, Xiaolong},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

Acknowledges

We thank: