Image Captioning Kazakh model (based on ExpansioNet v2)
- python >= 3.7
- numpy
- Java 1.8.0
- pytorch 1.9.0
- h5py
- playsound
- scipy
The checkpoint for the model is stored in drive. Please, place the file into the checkpoints
directory.
- Convert Pytorch model to onnx using this script.
- Convert onnx to TensorRT format. The onnx model file can be converted to a TensorRT egnine using the trtexec tool.
trtexec --onnx=./model.onnx --saveEngine=./model_fp32.engine --workspace=200
- Inference using TensorRT engine
python3 infer_trt.py
№ image | Pytorch model (model size:2.7GB) | TensorRT (FP32, model size: 986MB) |
---|---|---|
1 | 2.56 | 0.53 |
2 | 1.14 | 0.48 |
3 | 1.16 | 0.47 |
4 | 1.12 | 0.49 |
5 | 1.17 | 0.46 |
6 | 1.21 | 0.48 |
7 | 1.35 | 0.5 |
8 | 1.5 | 0.5 |
9 | 1.12 | 0.46 |
10 | 1.1 | 0.5 |
The implementation of the model relies on https://github.com/jchenghu/expansionnet_v2. We thank the original authors for their open-sourcing.
Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages
@article{Arystanbekov2023,
author = "Batyr Arystanbekov and Askat Kuzdeuov and Shakhizat Nurgaliyev and Hüseyin Atakan Varol",
title = "{Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages}",
year = "2023",
month = "2",
url = "https://www.techrxiv.org/articles/preprint/Image_Captioning_for_the_Visually_Impaired_and_Blind_A_Recipe_for_Low-Resource_Languages/22133894",
doi = "10.36227/techrxiv.22133894.v1"
}