Action Recognition on Epic Kitchens challenge

This repository contains the code used in the paper Seeing and Hearing Egocentric Actions: How Much Can We Learn?

If you use this code or its database, please consider citing:

	Author = {Alejandro Cartas and Jordi Luque and
		       Petia Radeva and Carlos Segura and Mariella Dimiccoli},
    Title = {Seeing and Hearing Egocentric Actions: How Much Can We Learn?},
    Booktitle = {The IEEE International Conference on Computer Vision (ICCV) Workshops},
    Month = {Oct},
    Year = {2019}

	Author = {Alejandro Cartas and Jordi Luque and
		       Petia Radeva and Carlos Segura and Mariella Dimiccoli},
	Title = {How Much Does Audio Matter to Recognize Egocentric Object Interactions?},
	Year = {2019},
	Eprint = {arXiv:1906.00634},


  1. Installation
  2. Preprocessing


  1. Clone this repository

    git clone --recursive
  2. Create the Conda environment:

    conda create -n epic_torch python=3.6 anaconda
    conda install -n epic_torch -c anaconda pip
    conda install -n epic_torch scikit-learn
    conda install -n epic_torch -c conda-forge addict easydict jq librosa 
    conda install -n epic_torch pytorch torchvision cudatoolkit=9.0 -c pytorch
    conda activate epic_torch
    pip install gulpio telegram-send


We trained our model on EPIC Kitchens Challenge dataset:

  1. Download the EPIC Kitchens Challenge dataset at

  2. In order to use the dataset, the RGB and optical flows frames need to be gulped and the spectrograms extracted from the video. Please follow the preprocessing steps.


Audio Network

python conf/audio_args.json

Video Temporal Segments Network

python conf/tsn_rgb_args.json


Audio Network

python conf/audio_args.json

Video Temporal Segments Network

python conf/tsn_rgb_args.json