GRCF: Geometrical Relation-aware Multi-modal Network with Confidence Fusion for Text-based Image Captioning
Pytorch implementation for GRCF.
We release the following pretrained GRCF model for the TextCaps dataset:
description | download link | validation set | test set |
---|---|---|---|
GRCF best | Baidu Netdisk code: ampz |
BLEU-4 25.7, CIDEr 106.9 |
BLEU-4 21.0, CIDEr 96.6 |
Our implementation is based on mmf framework, and and built upon M4C-Captioner. Please refer to mmf's document for more details on installation requirements.
(1) The original Textcaps dataset is from https://textvqa.org/textcaps/dataset/. Please download them from the links below and extract them under dataname directory:
(2) We use CNMT's imdb file to build our model.
-
imdb_train.npy
-
imdb_val_filtered_by_image_id.npy
-
imdb_test_filtered_by_image_id.npy
(3) At last, our data directory (/home/username
/.cache/torch/mmf/data/datasets/) structure should look like this:
textcaps
defaults
detectron
extras
dataname
m4c_textvqa_ocr_en_frcn_features
open_images
detectron_fix_100
imdb
imdb_train.npy
imdb_val_filtered_by_image_id.npy
imdb_test_filtered_by_image_id.npy
- to train the grmncf model on the TextCaps training set to get
best.model
:
CUDA_VISIBLE_DEVICES=0,1 mmf_run datasets=cnmtdata \
model=grcf \
config=projects/grcf/configs/grcf_defaults.yaml \
env.save_dir=./save/grcf/defaults \
run_type=train_val
- Using
best.model
to generate prediction json files on the validation set:
CUDA_VISIBLE_DEVICES=1 mmf_predict datasets=cnmtdata \
model=grcf \
config=projects/grcf/configs/grcf_defaults.yaml \
env.save_dir=./save/grcf/defaults \
run_type=val \
checkpoint.resume_file=./save/grcf/defaults/best.model
- Using
best.model
to generate prediction json files on the test set:
CUDA_VISIBLE_DEVICES=1 mmf_predict datasets=cnmtdata \
model=grcf \
config=projects/grcf/configs/grcf_defaults.yaml \
env.save_dir=./save/grcf/defaults \
run_type=test \
checkpoint.resume_file=./save/grcf/defaults/best.model
- to evaluate the prediction
json file
of the TextCaps validation set:
python /home/zhangsm/Python_project/mmf/projects/m4c_captioner/scripts/textcaps_eval.py \
--set val \
--annotation_file /home/zhangsm/.cache/torch/mmf/data/datasets/textcaps/defaults/annotations/imdb_val.npy \
--pred_file json_file
- You can submit the JSON file of the TextCaps test set to the EvalAI server for the result.
python=3.7.0
pytorch=1.6.0
huggingface-hub=0.2.1
Some important files' paths are as follows:
file name | path | description |
---|---|---|
grcf_defaults.yaml | projects/grcf/configs/ | |
defaults.yaml | mmf/configs/datasets/cnmtdata/ | |
grcf.py | mmf/models/ | |
builder.py, dataset.py | mmf/datasets/builders/cnmtdata/ | |
losses.py, metrics.py | mmf/modules/ |