VQA using the Differentiable Forth Interpreter (∂4)

We apply ∂4 to a visual question answering problem, and they jointly learned using the CLEVR dataset end-to-end. Here for more details.

Installation

Dependencies

∂4 interpreter

Python 3

Pytorch 1.10.0

with

Tensorflow 0.11.0

pip3 install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0-cp35-cp35m-linux_x86_64.whl

Use

Dataset

Generate CLEVR dataset form here.

Use this template for generating questions.

And save the rendered images and the generated CLEVR_questions.json to vqa/data directory.

Note

We edit ∂4: extensible_dsm.py, line 275. We changed the type into float32: create_alg_op_matrixret = np.zeros([size, size,size], dtype=np.float32).

Running the experiment

Extract features

python3 scripts/extract_features.py \
--input_image_dir data/images\
--output_h5_file data/train_features.h5

Process questions

Use this vocal.json for vocabs.

python3 scripts/preprocess_questions.py  \
--input_questions_json data/CLEVR_questions.json \
--input_vocab_json data/vocab.json \
--output_h5_file data/train_questions.h5

Training