The official github repository for the paper "On Modality Bias in the TVQA Dataset"
Our framework is built and adapted from the official TVQA repository. This repository includes access to the original dataset, the official website, the submission leaderboard and other projects, including TVQA+.
Using the IEM inclusion-exclusion measure in our paper, we propose subsets that respond to a mixture of modalities and features.
The essence of our framework can be used for any video-QA dataset with appropriate features. You'll have to adapt at least the dataloader and model classes to fit your new dataset. They function almost identically to the baseline TVQA classes, with added functionality. You may find it helpful to replicate our TVQA experiments first:
git clone
https://github.com/Jumperkables/tvqa_modality_bias`pip install -r requirements.txt
- Now assemble the dataset to run:
- Install the pytorch block fusion package, and place it in this directory. You will need to edit imports in the
model/tvqa_abc_bert_nofc.py
file to accomodate this fusion package for bilinear pooling.
Clone the TVQA github repository and follow steps 1, 2 and 3 for data extraction. This will give you the processed json files for the validation and training set. The processed json files contain questions, answers and subtitles. ImageNet features are in an h5 file. The ImageNet file is large and will require a significant amount of memory to load into memory, but you can specify no core driver for loading for lazy reads to avoid this.
Visual concepts are contained in det_visual_concepts_hq.pickle
file.
There are at most 20 regional features per frame, each 2048d, making this far too big to share. The original TVQA repository doesn't supply regional features or support them in the dataloader. We have implemented regional features seen in our paper under the name regional_topk
(not regional
).
You will need to follow the instruction here, and apply for the raw TVQA video frames, and extract them yourself.
Specifically, follow instructions from here. Once you have set up this repository, add our tools/generate_h5.py
from our repository to the bottom-up-attention/tools/
directory. Adapt this file to your raw video file location and run, extracting an h5 file for the entire dataset of frames (In our scripts we have called our regional file 100.h5). It will take a while, but our generation script should help a lot, and shows you the exact structure our dataloader will expect form the h5 file.
See our example_data_directory
as a guideline.
Scripts to run our experiments after data is collected, edit the relevant dataset and import paths in the main, config, utils and tvqa_dataset files to suit your repository structure and run these scripts.
Some tools used in our experiments for visualisation and convenience.
Published at BMVC 2020
@inproceedings{mbintvqa,
title={On Modality Bias in the TVQA Dataset},
author={Winterbottom, T. and Xiao, S. and McLean, A. and Al Moubayed, N.},
booktitle={Proceedings of the British Machine Vision Conference ({BMVC})},
year={2020}
}
Feel free to contact me @ thomas.i.winterbottom@durham.ac.uk
if you have any criticisms you'd like me to hear out or would like any help