Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos (CVPR 2022)
Given observation frames of the past, we predict future hand trajectories (green and red lines) and object interaction hotspots (heatmaps) in egocentric view. We genearte training data automatically and use this data to train an Object-Centric Transformer (OCT) model for prediction.
- Clone this repository:
git clone https://github.com/stevenlsw/hoi-forecast cd hoi-forecast
- Python 3.6 Environment:
conda env create -f environment.yaml conda activate fhoi
Official Epic-Kitchens Dataset looks the same as assets/EPIC-KITCHENS
, rgb frames needed for the demo has been pre-downloaded in assets/EPIC-KITCHENS/P01/rgb_frames/P01_01
.
-
Download Epic-Kitchens 55 Dataset annotations and save in
assets
folder -
Download hand-object detections below
link=https://data.bris.ac.uk/datasets/3l8eci2oqgst92n14w2yqi5ytu/hand-objects/P01/P01_01.pkl wget -P assets/EPIC-KITCHENS/P01/hand-objects $link
-
Run
python demo_gen.py
and results [png, pkl] are stored infigs
, you should visualize the result
- For more generated training labels, please visit google drive and run
python example.py
.
We maunally collect the hand trajectories and interaction hotspots for evaluation. We pre-extract the input videos features.
-
Download the processed files (include collected labels, pre-extracted features, and dataset partitions, 600 MB) and unzipped. You will get the stucture like:
hoi-forecast |-- data | |-- ek100 | | |-- ek100_eval_labels.pkl | | |-- video_info.json | | |-- labels | | | |-- label_303.pkl | | | |-- ... | | |-- feats | | | |-- data.lmdb (RGB) |-- common | |-- epic-kitchens-55-annotations | |-- epic-kitchens-100-annotations | |-- rulstm
-
Download pretrained models on EK100 and the stored model path is refered as
$resume
. -
Install PyTorch and dependencies by the following command:
pip install -r requirements.txt
-
Evaluate future hand trajectory
python traineval.py --evaluate --ek_version=ek100 --resume={path to the model} --traj_only
-
Evaluate future interaction hotspots
python traineval.py --evaluate --ek_version=ek100 --resume={path to the model}
-
Results should like:
Hand Trajectory Interaction Hotspots ADE ↓ FDE ↓ SIM ↑ AUC-J ↑ NSS ↑ 0.12 0.11 0.19 0.69 0.72
-
Extract per-frame features of training set similar to RULSTM and store them in
data/ek100/feats/ek100.lmdb
, the key-value pair likesfname = 'P01/rgb_frames/P01_01/frame_0000000720.jpg' env[fname.encode()] = result_dict # extracted feature results
-
Start training
python traineval.py --ek_version=ek100
@inproceedings{liu2022joint,
title={Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos},
author={Liu, Shaowei and Tripathi, Subarna and Majumdar, Somdeb and Wang, Xiaolong},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022}
}
We thank:
- epic-kitchens for hand-object detections on Epic-Kitchens dataset
- rulstm for features extraction and action anticipation
- epic-kitchens-dataset-pytorch for epic-kitchens dataloader
- Miao Liu for help with prior work