maysonma/VAANet

[AAAI 2020] Official implementation of VAANet for Emotion Recognition

Python

An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos

This is the official implementation of the paper "An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos".

Citation

If you use this code, please cite the following:

@inproceedings{Zhao2020AnEV,
  title={An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos},
  author={Sicheng Zhao and Yunsheng Ma and Yang Gu and Jufeng Yang and Tengfei Xing and Pengfei Xu and Runbo Hu and Hua Chai and Kurt Keutzer},
  booktitle={AAAI},
  year={2020}
}

Requirements

PyTorch (ver. 0.4+ required)
FFmpeg
Python3

Preparation

VideoEmotion-8

Download the videos here.
Convert from mp4 to jpg files using /tools/video2jpg.py
Add n_frames information using /tools/n_frames.py
Generate annotation file in json format using /tools/ve8_json.py
Convert from mp4 to mp3 files using /tools/video2mp3.py

Running the code

Assume the strcture of data directories is the following:

~/
  VideoEmotion8--imgs
    .../ (directories of class names)
      .../ (directories of video names)
        .../ (jpg files)
  VideoEmotion8--mp3
    .../ (directories of class names)
      .../ (mp3 files)
  results
  ve8_01.json

Confirm all options in ~/opts.py.

python main.py