Image Captioning

Image Captioning with BLIP model and deployed using Gradio.

Information

Ho Chi Minh City University of Science

Image and Video Processing Advanced - Assoc. Prof. Lý Quốc Ngọc

K31 - Master of Science - Group: CHOICES

No.	Student ID	Student Name
1	19127027	Võ Hoàng Bảo Duy
2	19127094	Phạm Ngọc Thiên Ân
3	19127292	Nguyễn Thanh Tình

How to run the deploy

Clone repository.

git clone https://github.com/ngthtinh99/ImageCaptioning.git

Install Python (Python 3.7 - 3.9 is required for supporting Pytorch).
Install necessary libraries.

pip install requests torch torchvision gradio timm fairscale transformers

Run the deploy, the first time downloading the model would take about 5 minutes, the next time would not need to reload.

python app.py

Browse the deploy on Localhost via the link http://localhost:7860, or the Public link generated in Command prompt.
Enjoy 🙂

References

[1] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.

[2] Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi, BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation.

[3] Gradio: Build Machine Learning Web Apps — in Python.