kaz-image-captioning: A Jupyter Notebook repository from ISSAI

Image Captioning Kazakh model (based on ExpansioNet v2)

Requirements

python >= 3.7
numpy
Java 1.8.0
pytorch 1.9.0
h5py
playsound
scipy

Model checkpoint

The checkpoint for the model is stored in drive. Please, place the file into the checkpoints directory.

Inference acceleration with NVIDIA's TensorRT deep learning library

Convert Pytorch model to onnx using this script.
Convert onnx to TensorRT format. The onnx model file can be converted to a TensorRT egnine using the trtexec tool.

trtexec --onnx=./model.onnx --saveEngine=./model_fp32.engine --workspace=200

Inference using TensorRT engine

python3 infer_trt.py

Inference time (sec) @384x384 pixel images: Pytorch (.pth) vs. TensorRT (.engine)

№ image	Pytorch model (model size:2.7GB)	TensorRT (FP32, model size: 986MB)
1	2.56	0.53
2	1.14	0.48
3	1.16	0.47
4	1.12	0.49
5	1.17	0.46
6	1.21	0.48
7	1.35	0.5
8	1.5	0.5
9	1.12	0.46
10	1.1	0.5

Acknowledgements

The implementation of the model relies on https://github.com/jchenghu/expansionnet_v2. We thank the original authors for their open-sourcing.

Preprint on TechRxiv

Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages

BibTex

@article{Arystanbekov2023,
author = "Batyr Arystanbekov and Askat Kuzdeuov and Shakhizat Nurgaliyev and Hüseyin Atakan Varol",
title = "{Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages}",
year = "2023",
month = "2",
url = "https://www.techrxiv.org/articles/preprint/Image_Captioning_for_the_Visually_Impaired_and_Blind_A_Recipe_for_Low-Resource_Languages/22133894",
doi = "10.36227/techrxiv.22133894.v1"
}

IS2AI/kaz-image-captioning