Action Recognition on Epic Kitchens challenge

This repository contains the code used in the paper Seeing and Hearing Egocentric Actions: How Much Can We Learn?

If you use this code or its database, please consider citing:

@InProceedings{Cartas_2019_ICCV,
	Author = {Alejandro Cartas and Jordi Luque and
		       Petia Radeva and Carlos Segura and Mariella Dimiccoli},
    Title = {Seeing and Hearing Egocentric Actions: How Much Can We Learn?},
    Booktitle = {The IEEE International Conference on Computer Vision (ICCV) Workshops},
    Month = {Oct},
    Year = {2019}
} 

@misc{1906.00634,
	Author = {Alejandro Cartas and Jordi Luque and
		       Petia Radeva and Carlos Segura and Mariella Dimiccoli},
	Title = {How Much Does Audio Matter to Recognize Egocentric Object Interactions?},
	Year = {2019},
	Eprint = {arXiv:1906.00634},
}

Installation
Preprocessing

Installation

Clone this repository

git clone --recursive https://github.com/gorayni/seeing_and_hearing

Create the Conda environment:

conda create -n epic_torch python=3.6 anaconda
conda install -n epic_torch -c anaconda pip
conda install -n epic_torch scikit-learn
conda install -n epic_torch -c conda-forge addict easydict jq librosa 
conda install -n epic_torch pytorch torchvision cudatoolkit=9.0 -c pytorch
conda activate epic_torch
pip install gulpio telegram-send

Preprocessing

We trained our model on EPIC Kitchens Challenge dataset:

Download the EPIC Kitchens Challenge dataset at https://epic-kitchens.github.io/2018.
In order to use the dataset, the RGB and optical flows frames need to be gulped and the spectrograms extracted from the video. Please follow the preprocessing steps.

Training

Audio Network

python audio_main.py conf/audio_args.json

Video Temporal Segments Network

python tsn_main.py conf/tsn_rgb_args.json

Testing

Audio Network

python test_epic_challenge_audio.py conf/audio_args.json

Video Temporal Segments Network

python tsn_test_main.py conf/tsn_rgb_args.json

gorayni/seeing_and_hearing

Action Recognition on Epic Kitchens challenge

Contents

Installation

Preprocessing

Training

Audio Network

Video Temporal Segments Network

Testing

Audio Network

Video Temporal Segments Network