This repository is an unofficial model inference codebase for the papr Dual-Level Collaborative Transformer for Image Captioning (AAAI2021).
In this repo, we incorporate the repo image-captioning-DLCT and grid-feats-vqa to make image caption model inference more easily.
Given the image path, you can get the image caption sentence with only two function calls (initialize_model_states
and inference
)!
The steps of this inference repo are show as follows:
- grid features extractions from the original_script
- region features extractions from the original_script
- the alignment graphs extraction from the original_script
- image caption model inference with all features above from the original_script
- Download feature extraction pretrained model X-101 from grid-feats-vqa
- Download image caption pretrained model (acess code: jcj6) from image-captioning-DLCT.
- Install python dependency
pip3 install -r requirements.txt
- Install
en_core_web_sm
python3 -m spacy download en_core_web_sm
- Install Detectron 2 following grid-feats-vqa
python3 -m pip install 'git+https://github.com/facebookresearch/detectron2.git@ffff8ac'
from image_caption_lib import initialize_model_states, inference
path_to_extract_model_weight = "..."
path_to_caption_model_weight = "..."
path_to_inference_image = "..."
model_states = initialize_model_states(
path_to_extract_model_weight,
path_to_caption_model_weight,
)
caption_text = inference(model_states, path_to_image)