eric8607242/ImageCaption_Inference

Python

DLCT_ImageCaption_Inference

This repository is an unofficial model inference codebase for the papr Dual-Level Collaborative Transformer for Image Captioning (AAAI2021).

In this repo, we incorporate the repo image-captioning-DLCT and grid-feats-vqa to make image caption model inference more easily.

Given the image path, you can get the image caption sentence with only two function calls (initialize_model_states and inference)!

The steps of this inference repo are show as follows:

grid features extractions from the original_script
region features extractions from the original_script
the alignment graphs extraction from the original_script
image caption model inference with all features above from the original_script

Requirements

Download feature extraction pretrained model X-101 from grid-feats-vqa
Download image caption pretrained model (acess code: jcj6) from image-captioning-DLCT.
Install python dependency

pip3 install -r requirements.txt

Install en_core_web_sm

python3 -m spacy download en_core_web_sm

Install Detectron 2 following grid-feats-vqa

python3 -m pip install 'git+https://github.com/facebookresearch/detectron2.git@ffff8ac'

Getted start

Inference

from image_caption_lib import initialize_model_states, inference

path_to_extract_model_weight = "..."
path_to_caption_model_weight = "..."

path_to_inference_image = "..."


model_states = initialize_model_states(
    path_to_extract_model_weight,
    path_to_caption_model_weight,
)

caption_text = inference(model_states, path_to_image)

Acknowledgements