Pytorch implementation for the ACM MM 2020 paper: Cascade Reasoning Network for Text-based Visual Question Answering
Clone this repository, and build it with the following command.
# activate your own conda environment
# [Alternative]
# conda env create -f CRN.yaml
# conda activate CRN_env
git clone https://github.com/guanghuixu/CRN_tvqa.git
cd CRN_tvqa
python setup.py build develop
Datasets | Object Features | OCR Features |
---|---|---|
TextVQA | Open Images | TextVQA Rosetta-en OCRs |
ST-VQA | ST-VQA Objects | ST-VQA Rosetta-en OCRs |
OCR-VQA | OCR-VQA Objects | OCR-VQA Rosetta-en OCRs |
cd ~/CRN_tvqa
# Download dataset annotations
wget https://github.com/guanghuixu/CRN_tvqa/releases/download/data/data.tar.xz
tar xf data.tar.xz
cd data
# Download detectron weights
wget http://dl.fbaipublicfiles.com/pythia/data/detectron_weights.tar.gz
tar xf detectron_weights.tar.gz
# Now download the features required, feature link is taken from the table below [Provided by M4C]
cd crn_textvqa
wget https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz
tar xf open_images.tar.gz
wget https://dl.fbaipublicfiles.com/pythia/m4c/data/m4c_textvqa_ocr_en_frcn_features.tar.gz
tar xf m4c_textvqa_ocr_en_frcn_features.tar.gz
cd ../..
# calculate the edge features for [train, val, test] split
bash scripts/process_dataset.sh crn_textvqa data/crn_textvqa/imdb/imdb_train_ocr_en.npy
The training and evaluation commands can be found in the ./scripts
. The config files can be found in the ./configs
- to train the model on the TextVQA training set:
# bash scripts/<train.sh> <GPU_ids> <save_dir>
bash scripts/train_textvqa.sh 0,1 textvqa_debug
(Note: replace textvqa
with other datasets and other config files to train with other datasets and configurations.)
- to evaluate the pretrained model on the TextVQA validation/test set:
# bash scripts/<val.sh> <GPU_ids> <save_dir> <checkpoint> <run_type>
bash scripts/val_textvqa.sh 0,1 textvqa_debug save/textvqa_debug/crn_textvqa_crn/best.ckpt val
bash scripts/val_textvqa.sh 0,1 textvqa_debug save/textvqa_debug/crn_textvqa_crn/best.ckpt inference
(Note: --<run_type>
use inference
instead of val
to generate the EvalAI prediction files for the test set )
If you use our code in your research, please cite our paper:
@inproceedings{liu2020crn,
title={Cascade Reasoning Network for Text-based Visual Question Answering},
author={Fen Liu, Guanghui Xu, Qi Wu, Qing Du, Wei Jia and Mingkui Tan},
booktitle={Proceedings of the 28th ACM International Conference on Multimedia},
year={2020}
}