The official github repository for Trying Bilinear Pooling in Video-QA. This repository uses code adapted from the TVQA and Heterogeneous Memory Enhanced VQA repositories.
git clone https://github.com/Jumperkables/kable_management/tree/master/trying_blp.git
Our paper surveys 4 datasets across 2 models. In total you'll need to prepare the following 5 main subdirectories and manage 4 datasets. The TVQA model scripts all use one python 3 virtual enviroment, and all the others (including HME-TVQA) share a python 2 virtual enviroment.
We have merged our TVQA code for this project with another of our projects, On Modality Bias in the TVQA Dataset. Follow the setup instructions for this repository, including data collection, and leave it in this position under the name 'tvqa'.
To reproduce the experiments in this paper you will not need to extract regional features. Feel free to skip that rather complicated step.
Our experiments include using TVQA on the HME model. You will need to extract motion vectors from the TVQA raw frames. HME-TVQA will share a dataset directory with TVQA. Our C3D feature extraction code is adapted from this respoitory.
-
Apply for access to the raw TVQA frames as detailed in the TVQA subsection.
-
Create another virtual enviroment (Python 2, not my choice im afraid):
pip install hme_tvqa/pytorch_c3d_extraction/requirements.txt
-
Collect the c3d.pickle pretrained model from the above repository and put it in the
hme_tvqa/pytorch_c3d_extraction
directory. -
Extraction is done by
pytorch_c3d_extraction/feature_extractor_frm.py
, and can be ran byc3d.sh
. Set appropriate values for--OUTPUT_DIR, --VIDEO_DIR, --OUTPUT_NAME
. -
The extracted h5 file is in a slightly incorrect format, use
pytorch_c3d_extraction/fix_h5.py
(editold
andnew
) to fix h5 file. -
In your TVQA data directory, add a new subdirectory 'motion_features', and place
fixed_tvqa_c3d_fc6_features.h5
(or whatever you named the fixed h5 file) into it. -
Since HME is implemented in Python 2, convert
det_visual_concepts_hq.pickle
,word2idx.pickle
,idx2word.pickle
andvocab_embedding.pickle
to a Python 2 compatible format. We have implemented a tool intvqa/tools/pickle3topickle2.py
that should do this for you.
- Collect EgoVQA from here.
- Collect these two datasets from this repository.
This directory contains the adapted models used for all HME based experiments.
As previously mentioned, the TVQA model scripts use a python 3 virtual enviroment, and the other scripts (including HME-TVQA) all share a different python 2 virtual enviroment. Example scripts to run the experiments in our paper can be found with the in scripts
with the same hyperparameters.
When collected, your datasets should look something like this.
@inproceedings{tryingblp,
~~ title={Trying Bilinear Pooling in Video-QA},~~
~~ author={Winterbottom, T. and Xiao, S. and McLean, A. and Al-Moubayed, N},~~
~~ booktitle={arXiv},~~
~~ year={2020}~~
}
Feel free to contact us @ thomas.i.winterbottom@durham.ac.uk