Simple Baseline For Audio-Visual Scene-Aware Dialog

This repository is the implementation of A Simple Baseline for Audio-Visual Scene-Aware Dialog .

The code is based on Hori’s naive baseline. We thank AVSD team for dataset and sharing implementation code.

Required packages

python 2.7
pytorch 0.4.1
numpy
six
java 1.8.0 (for coco-evaluation tools)

Data

We use AVSD v0.1 official train-set. For validation and evaluation we use the prototype val-set and test-set. See DSTC7 AVSD challenge for more details. Please cite AVSD if you use their dataset.

Download AVSD annotations from this link, and extract to ‘data/’

Download CHARADES audio-video related features from this link, and extract to ‘data/charades_features’

Run

The script has 4 stages

stage 1 - preparation of dependent packages
stage 2 - training
stage 3 - generation of sentences on test-set
stage 4 - evaluation of generated sentences

Use: $ ./run —stage X to run desired stage.

You can follow this link for pretrained model.

idansc/simple-avsd

Simple Baseline For Audio-Visual Scene-Aware Dialog

Required packages

Data

Run