DLCT_ImageCaption_Inference

This repository is an unofficial model inference codebase for the papr Dual-Level Collaborative Transformer for Image Captioning (AAAI2021).

In this repo, we incorporate the repo image-captioning-DLCT and grid-feats-vqa to make image caption model inference more easily.

Given the image path, you can get the image caption sentence with only two function calls (initialize_model_states and inference)!

The steps of this inference repo are show as follows:

  1. grid features extractions from the original_script
  2. region features extractions from the original_script
  3. the alignment graphs extraction from the original_script
  4. image caption model inference with all features above from the original_script

Requirements

pip3 install -r requirements.txt
  • Install en_core_web_sm
python3 -m spacy download en_core_web_sm
python3 -m pip install 'git+https://github.com/facebookresearch/detectron2.git@ffff8ac'

Getted start

Inference

from image_caption_lib import initialize_model_states, inference

path_to_extract_model_weight = "..."
path_to_caption_model_weight = "..."

path_to_inference_image = "..."


model_states = initialize_model_states(
    path_to_extract_model_weight,
    path_to_caption_model_weight,
)

caption_text = inference(model_states, path_to_image)

Acknowledgements