End-to-End Human-Gaze-Target Detection with Transformers

An unofficial PyTorch implementation of the paper "End-to-End Human-Gaze-Target Detection with Transformers".

'conda activate human-gaze-target-detection-transformer'

Prerequisites

Environment and dependencies

We provide a pip requirements file to install all the dependencies. We recommend to use a conda environment to install the dependencies.

# Clone project and submodules
git clone --recursive https://github.com/francescotonini/human-gaze-target-detection-transformer.git
cd human-gaze-target-detection-transformer

# Create conda environment
conda create -n human-gaze-target-detection-transformer python=3.9
conda activate human-gaze-target-detection-transformer

# Install requirements
pip install -r requirements.txt

(optional) Setup wandb

cp .env.example .env

# Add token to .env

Dataset preprocessing

The code expects that the datasets are placed under the data/ folder. You can change this by modifying the data_dir parameter in the configuration files.

cat <<EOT >> configs/local/default.yaml
# @package _global_

paths:
  data_dir: "{PATH TO DATASETS}"
EOT

The implementation requires faces annotations ("auxiliary faces", i.e. the ones not annotated by GazeFollow or VideoAttentionTarget). Therefore, you need run the following script to extract face annotations.

# GazeFollow
python scripts/gazefollow_get_aux_faces.py --dataset_path /path/to/gazefollow --subset train
python scripts/gazefollow_get_aux_faces.py --dataset_path /path/to/gazefollow --subset test

# VideoAttentionTarget
python scripts/videoattentiontarget_get_aux_faces.py --dataset_path /path/to/videoattentiontarget --subset train
python scripts/videoattentiontarget_get_aux_faces.py --dataset_path /path/to/videoattentiontarget --subset test

Training

We provide configuration to train on GazeFollow and VideoAttentionTarget (see configs/experiment/).

# GazeFollow
python src/train.py experiment=hgttr_gazefollow

# VideoAttentionTarget
python src/train.py experiment=hgttr_videoattentiontarget model.net_pretraining={URL TO GAZEFOLLOW PRETRAINING}

Evaluation

The configuration files are useful also when evaluating the model.

# GazeFollow
python src/eval.py experiment=hgttr_gazefollow ckpt_path={PATH TO CHECKPOINT}

# VideoAttentionTarget
python src/eval.py experiment=hgttr_videoattentiontarget ckpt_path={PATH TO CHECKPOINT}

Acknowledgements

This code is based on PyTorch Lightning, Hydra, and the official DETR implementation.

Cite us

If you use this code implementation in your research, please cite us:

@InProceedings{tonini2023object,
  title={Object-aware Gaze Target Detection},
  author={Tonini, Francesco and Dall'Asen, Nicola and Beyan, Cigdem and Ricci, Elisa},
  booktitle={The IEEE International Conference on Computer Vision (ICCV)},
  month={Oct},
  year={2023}
}

and the original paper:

@inproceedings{tu2022end,
  title={End-to-end human-gaze-target detection with transformers},
  author={Tu, Danyang and Min, Xiongkuo and Duan, Huiyu and Guo, Guodong and Zhai, Guangtao and Shen, Wei},
  booktitle={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month={June},
  year={2022},
}

Mostafa-Haggag/human-gaze-target-detection-transformer