/mgn

Multimodal Graph Network (MGN): Code repo, examples from the paper

Primary LanguagePythonMIT LicenseMIT

Logo

Multimodal Graph Networks (MGN)

Associated supporting code for the [Multimodal Graph Networks] paper(https://arxiv.org/).

Table of Contents

Introduction

This code repo acts as the supplementary code and dataset repo for the MGN paper. For CLEVR dataset generation please refer to the original CLEVR repo. For CLOSURE templates, please refer to the repo and paper.

Setup

  1. Clone this repo and the submodules.
  2. Create a conda environment (or virtualenv) (Python 3.7+) for this project:
$ conda create --name mgn

Then use the requirements.txt to install the required packages

$ pip install -r requirements.txt

Prerequisites

The CLEVR Parser library uses spacy framework as the NLP backend to use.

-- Spacy --

The default backend uses spacy for language parsing and pretrained LM models used for embeddings.

Please see spacy's doc for installation instructions.

Spacy language models (LM) can be downloaded following instructions here. N.b. the spacy-transformers package (homepage, github), can be used to download SotA transformer based (BERT, XLNet, RoBerTa) LMs - including the popular HuggingFace implementations.

The very basic installation entails:

$ pip install spacy
$ python -m spacy download en_core_web_sm 

Once installed, validate the available LMs using: python -m spacy info and python -m spacy validate.

spacy_validate

Dataset Generation

Please follow instructions from the CLEVR Dataset Generation repo here You can clone a local copy under ./vendors within the project using:

git submodule update --init --recursive

For replicating the experiments with captions, we can essentially use the same scripts in the aforementioned repo, but simply use 'caption generation' templates. These templates are included in the data/templates directory.

A demo data directory for illustration can be obtained by running:

. data/download-demo-data.sh

The subsequent structure of the data folder should look like: data-mgn-demo

The CLOSURE templates (post downloading) are under data/CLOSURE_v1.0. Addtional templates are under data/templates

Running Experiments

  • Preprocess the questions/captions to generate the .h5 file (e.g. clevr_train_questions_25k.h5)

  • Train: Pretrain on 25K questions, then use the pre-trained model for fine-tuning (using REINFORCE)

    • Pretrain:
    $ python ${ROOT}/mgn/tools/run_train.py \
                    --checkpoint_every 50   \
                    --num_iters 100 \
                    --run_dir ../data/outputs/model_pretrain_clevr_25kpg \
                    --clevr_train_question_path ../data/${PATH_TO_PREPROCESSED_QUESTIONS}/clevr_train_questions_25000/clevr_train_questions_25k.h5 \
                    --gembd_vec_dim 96
    
    • Fine-Tune:
    $ python ${ROOT}/mgn/tools/run_train.py \
                    --reinforce 1 \
                    --learning_rate 1e-5 \
                    --checkpoint_every 50   \
                    --num_iters 100 \
                    --run_dir ../data/outputs/model_reinforce_clevr_25kpg \
                    --load_checkpoint_path ../data/outputs/model_pretrain_clevr_25kpg/checkpoint_best.pt \
                    --clevr_train_question_path ../data/${PATH_TO_PREPROCESSED_QUESTIONS}/clevr_train_questions_25000/clevr_train_questions_25k.h5 \
                    --gembd_vec_dim 96 
    
    
  • Test:

    $ python ${ROOT}/mgn/tools/run_test.py \                     
                    --run_dir ../data/results \
                    --clevr_val_scene_path ../data/{PATH_TO_SCENES}/clevr_val_scenes_parsed.json \
                    --clevr_val_question_path ../data/{PATH_TO_PREPROCESSED_QUESTIONS}/clevr_val_questions.h5 \
                    --clevr_vocab_path ../data/{PATH_TO_VOCAB}/clevr_vocab.json \
                    --load_checkpoint_path ../data/outputs/model_reinforce_clevr_25kpg/checkpoint_best.pt
                    --max_val_samples 1024 \
                    --is_baseline_model 0