/simple-avsd

Code for ''A Simple Baseline for Audio-Visual Scene-Aware Dialog``

Primary LanguagePython

Simple Baseline For Audio-Visual Scene-Aware Dialog

This repository is the implementation of A Simple Baseline for Audio-Visual Scene-Aware Dialog .

The code is based on Hori’s naive baseline. We thank AVSD team for dataset and sharing implementation code.

Required packages

  • python 2.7
  • pytorch 0.4.1
  • numpy
  • six
  • java 1.8.0 (for coco-evaluation tools)

Data

We use AVSD v0.1 official train-set. For validation and evaluation we use the prototype val-set and test-set. See DSTC7 AVSD challenge for more details. Please cite AVSD if you use their dataset.

Download AVSD annotations from this link, and extract to ‘data/’

Download CHARADES audio-video related features from this link, and extract to ‘data/charades_features’

Run

The script has 4 stages

  • stage 1 - preparation of dependent packages
  • stage 2 - training
  • stage 3 - generation of sentences on test-set
  • stage 4 - evaluation of generated sentences

Use: $ ./run —stage X to run desired stage.

You can follow this link for pretrained model.