GRCF: Geometrical Relation-aware Multi-modal Network with Confidence Fusion for Text-based Image Captioning

Introduction:

Pytorch implementation for GRCF.

Pretrained GRCF model:

We release the following pretrained GRCF model for the TextCaps dataset:

description	download link	validation set	test set
GRCF best	Baidu Netdisk code: `ampz`	`BLEU-4` 25.7, `CIDEr` 106.9	`BLEU-4` 21.0, `CIDEr` 96.6

Installation:

Our implementation is based on mmf framework, and and built upon M4C-Captioner. Please refer to mmf's document for more details on installation requirements.

Dataset:

(1) The original Textcaps dataset is from https://textvqa.org/textcaps/dataset/. Please download them from the links below and extract them under dataname directory:

(2) We use CNMT's imdb file to build our model.

imdb_train.npy
imdb_val_filtered_by_image_id.npy
imdb_test_filtered_by_image_id.npy

(3) At last, our data directory (/home/username/.cache/torch/mmf/data/datasets/) structure should look like this:

textcaps

defaults

detectron

extras

dataname

m4c_textvqa_ocr_en_frcn_features

open_images

detectron_fix_100

imdb

imdb_train.npy

imdb_val_filtered_by_image_id.npy

imdb_test_filtered_by_image_id.npy

Running the code:

to train the grmncf model on the TextCaps training set to get best.model:

CUDA_VISIBLE_DEVICES=0,1 mmf_run datasets=cnmtdata \
    model=grcf \
    config=projects/grcf/configs/grcf_defaults.yaml \
    env.save_dir=./save/grcf/defaults \
    run_type=train_val

Using best.model to generate prediction json files on the validation set:

CUDA_VISIBLE_DEVICES=1 mmf_predict datasets=cnmtdata \
   model=grcf \
   config=projects/grcf/configs/grcf_defaults.yaml \
   env.save_dir=./save/grcf/defaults \
   run_type=val \
   checkpoint.resume_file=./save/grcf/defaults/best.model

Using best.model to generate prediction json files on the test set:

CUDA_VISIBLE_DEVICES=1 mmf_predict datasets=cnmtdata \
   model=grcf \
   config=projects/grcf/configs/grcf_defaults.yaml \
   env.save_dir=./save/grcf/defaults \
   run_type=test \
   checkpoint.resume_file=./save/grcf/defaults/best.model

to evaluate the prediction json file of the TextCaps validation set:

python /home/zhangsm/Python_project/mmf/projects/m4c_captioner/scripts/textcaps_eval.py \
    --set val \
    --annotation_file /home/zhangsm/.cache/torch/mmf/data/datasets/textcaps/defaults/annotations/imdb_val.npy \
    --pred_file   json_file

You can submit the JSON file of the TextCaps test set to the EvalAI server for the result.

Annotation:

python=3.7.0

pytorch=1.6.0

huggingface-hub=0.2.1

Some important files' paths are as follows:

file name	path	description
grcf_defaults.yaml	projects/grcf/configs/
defaults.yaml	mmf/configs/datasets/cnmtdata/
grcf.py	mmf/models/
builder.py, dataset.py	mmf/datasets/builders/cnmtdata/
losses.py, metrics.py	mmf/modules/

shmizhang/GRCF