/GRCF

Primary LanguagePython

GRCF: Geometrical Relation-aware Multi-modal Network with Confidence Fusion for Text-based Image Captioning

Introduction:

Pytorch implementation for GRCF.

Pretrained GRCF model:

We release the following pretrained GRCF model for the TextCaps dataset:

description download link validation set test set
GRCF best Baidu Netdisk code: ampz BLEU-4 25.7, CIDEr 106.9 BLEU-4 21.0, CIDEr 96.6

Installation:

Our implementation is based on mmf framework, and and built upon M4C-Captioner. Please refer to mmf's document for more details on installation requirements.

Dataset:

(1) The original Textcaps dataset is from https://textvqa.org/textcaps/dataset/. Please download them from the links below and extract them under dataname directory:

(2) We use CNMT's imdb file to build our model.

  • imdb_train.npy

  • imdb_val_filtered_by_image_id.npy

  • imdb_test_filtered_by_image_id.npy

(3) At last, our data directory (/home/username/.cache/torch/mmf/data/datasets/) structure should look like this:

textcaps

defaults

detectron

extras

dataname

m4c_textvqa_ocr_en_frcn_features

open_images

detectron_fix_100

imdb

imdb_train.npy

imdb_val_filtered_by_image_id.npy

imdb_test_filtered_by_image_id.npy

Running the code:

  • to train the grmncf model on the TextCaps training set to get best.model:
CUDA_VISIBLE_DEVICES=0,1 mmf_run datasets=cnmtdata \
    model=grcf \
    config=projects/grcf/configs/grcf_defaults.yaml \
    env.save_dir=./save/grcf/defaults \
    run_type=train_val   
  • Using best.model to generate prediction json files on the validation set:
CUDA_VISIBLE_DEVICES=1 mmf_predict datasets=cnmtdata \
   model=grcf \
   config=projects/grcf/configs/grcf_defaults.yaml \
   env.save_dir=./save/grcf/defaults \
   run_type=val \
   checkpoint.resume_file=./save/grcf/defaults/best.model
  • Using best.model to generate prediction json files on the test set:
CUDA_VISIBLE_DEVICES=1 mmf_predict datasets=cnmtdata \
   model=grcf \
   config=projects/grcf/configs/grcf_defaults.yaml \
   env.save_dir=./save/grcf/defaults \
   run_type=test \
   checkpoint.resume_file=./save/grcf/defaults/best.model
  • to evaluate the prediction json file of the TextCaps validation set:
python /home/zhangsm/Python_project/mmf/projects/m4c_captioner/scripts/textcaps_eval.py \
    --set val \
    --annotation_file /home/zhangsm/.cache/torch/mmf/data/datasets/textcaps/defaults/annotations/imdb_val.npy \
    --pred_file   json_file
  • You can submit the JSON file of the TextCaps test set to the EvalAI server for the result.

Annotation:

python=3.7.0

pytorch=1.6.0

huggingface-hub=0.2.1

Some important files' paths are as follows:

file name path description
grcf_defaults.yaml projects/grcf/configs/
defaults.yaml mmf/configs/datasets/cnmtdata/
grcf.py mmf/models/
builder.py, dataset.py mmf/datasets/builders/cnmtdata/
losses.py, metrics.py mmf/modules/